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AN EVALUATION OF SMALL GROUP WORK 
IN A LARGE CLASS 


GOODWIN WATSON 


Teachers College, Columbia University 


The division of large classes into small groups for discussion has 
been practiced at Teachers College for more than thirty years. 
Every teacher who uses small discussion groups has found that 
some students enjoy the experience and rate it as more valuable 
to them than any other phase of the course. Others complain that 
their group got nowhere; that the discussions were a waste of time. 
Few efforts have been made to find out why the group experience 
means so much more to some students than to others. 

This report covers reactions of three hundred and fifty graduate 
students taking a basic required course entitled Education as Per- 
sonal Development. Each student worked in a small discussion 
group. There were forty-two groups, ranging in size from four to 
fifteen members; the average size was eight. A questionnaire and 
pre-test at the beginning of the course provide data which can be 
related to the appraisal each student made of his group experience.! 
On the final evaluation form each student rated his group experi- 
ence for enjoyment, accomplishment, and for what he learned from 
his group. As shown in Table I, the most typical response was: 
Enjoyed the group; good group spirit; fine people; group accom- 
plishment only about average, but I learned more in this group 
than in most courses. About two-thirds of the students gave this 
warm general endorsement. As compared with other class activities 
(Table II) the group discussions were thought to be about equal in 
value to the required readings; less useful than staff lectures, but 
more beneficial than films, panels, written work or supplementary 
reading. 

Combining the several appraisals of small group discussions, we 
selected a ‘high value’ group of fifty-five students and a ‘low value’ 


1 Analysis of the data was aided by a grant from the Faculty Research 
Fund of Teachers College. 
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TABLE I. CoMPONENTs oF ATTITUDE TowaRpD Group WorK 





Value Rating for Group Work 



































High |Moderate! Low All 
value value value 
N= 53 185 52 290 
Enjoyment Percentage 
Marvelous group; warm fellowship; high 55 9 = 15 
enjoyment 
Enjoyed group; good group spirit; fine 45 83 36 68 
people 
Enjoyed certain individuals, not so much — 7 48 13 
the group as a whole 
Group only fair; sessions dull; little group — 1 12 3 
feeling 
Did not enjoy — — 4 1 
100 100 100 100 
Accomplishment 
Very proud; extraordinary 8 -— — 1 
More than most 76 25 4 30 
About average 16 69 56 58 
Less than average — 4 25 7 
Nothing but talk — 2 15 4 
100 100 100 100 
Learning 
As educative as any experience I ever had 48 20 2 21 
More than most courses 48 47 17 42 
Like average course 4 27 40 25 
Less than average course — 5 35 10 
Complete waste of time — 1 6 2 
100 100 100 100 
Rank for Value Among Class Procedures 
1 (Best) 77 10 — 21 
2 17 25 2 19 
3 6 20 — 14 
4 24 6 16 
5 16 23 14 
6 4 33 9 
7 (Poorest) 1 36 7 
100 100 100 100 
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group of fifty-one students. Because of the generally favorable 
attitude toward small group sessions, the ‘low value’ group neces- 
sarily includes some who, as shown in Table I, rated their group as 
enjoyable, its accomplishment as ‘average’ and their own learning 
as at least equal to that in an average course. On the whole, how- 
ever, (Table II) the ‘high value’ students ranked their group ex- 
perience higher than any other course activity while the ‘low’ value 
students ranked it below every other course activity. The rest of 
this report will be concerned with hypotheses designed to predict 
the ‘high’ and ‘low’ evaluations. First, we shall examine four hy- 


TABLE II. COMPARATIVE EVALUATION OF CouRSE ACTIVITIES 





























Mean Rank Assigned Each of Seven 
Kinds of Course Activity b 
Students Giving Group Work: 
High | Moderate} Low All 
value | value value 
N = 55 243 | Si 349 
1) Staff lectures 2.80 2.25 1.6 2.23 
2) Required readings 4.40 3.33 3.06 3.43 
3) Group discussions 1.34 3.42 6.03 3.47 
4) Moving pictures 4.70 4.19 4.02 4.23 
5) Panel discussions 4.47 4.59 4.12 4.51 
6) Writing professional diary 4.58 4.64 4.72 4.64 
7) Voluntary, supplemental reading | 5.59 | 4.94 4.37 4.95 











potheses which concern the group as a whole, employing a ‘molar’ 
rather than ‘molecular’ concept of group value. 

Hypothesis 1. Groups are homogeneous in judging the value of 
their experience; good groups tend to be rated high by all members; 
poor groups tend to be rated low by all members; the differentiating 
factors must be sought not in personal characteristics of individuals 
but in some features common to the whole group.—Actually, there 
were no groups in which all members agreed in attributing either 
‘high’ or ‘low’ value to their group experience. The hypothesis of 
homogeneity would lead us to expect ‘high’ rating members to be 
concentrated in perhaps seven or eight excellent groups; the ‘low’ 
rating members in an unfortunate seven or eight. Actually the 
fifty-five ‘high’ raters were distributed among twenty-four different 
groups; the fifty-one ‘low’ raters were found scattered through 
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twenty-six different groups. The distribution of high’s and low’s 
throughout the groups did not differ significantly from what chance 
alone would have given. There were eleven groups in which one 
or more members gave a ‘high’ rating while one or more others 
gave the same group experience a ‘low’ rating. 

The first hypothesis must be rejected. The students who get 
most out of their group work are not concentrated in certain out- 
standing groups, nor is a ‘poor group’ the main reason for low 
ratings. Value judgments by members in the same group vary al- 
most as widely as do those of invididuals selected at random from 


various groups. 


TABLE III. Size or Group AND VALUE RATING 


























Value Rating* 
Size of Groups No. of Groups _ . a 
High | Moderate Low All 
4-6 12 6 41 5 52 
7-9 15 11 82 28 121 
10-15 15 36 109 19 164 
42 53 | 232 52 | 337 











* Distributions significally different at .01 level. 


Hypothesis 2. Average value rating will decrease as groups increase 
in size—The hypothesis rests upon Bales’ observation that as 
groups grow from four to fifteen members, there is increasing ten- 
dency for one member to dominate and for some to remain only 
listeners. Our smaller groups might therefore be expected to have 
more nearly equal participation by their members, and hence a 
higher level of satisfaction. 

The data reported in Table III show a statistically significant 
relationship, but not in the expected direction. The large groups 
had, proportionately more ‘high’ ratings. In the one group as large 
as fifteen members, eight reported high satisfaction. One group of 
eleven members produced eight ratings of ‘high’, yet another group 
the same size produced five ‘low’ ratings and no ‘high’. Groups with 
most low ratings included neither the largest nor the smallest 
groups; but were predominantly of average size; the smallest groups 
yielded more moderate and fewer extreme appraisals. 

The third molar hypothesis concerns two types of work carried 
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on by these groups. A random half of the groups was arbitrarily 
directed to select a single problem of special interest to members 
and to work continuously on that subject all semester. The remain- 
ing groups were directed to discuss in their group each week some 
issue arising from the accompanying class session; hence they talked 
about a different question at each meeting. 

Hypothesis 3. Groups working continuously and cumulatively on a 
single project will be experienced as more valuable than are groups 
discussing a variety of issues more superficially; the difference in 
their sense of accomplishment will be especially marked.—The re- 
sults, presented in Table IV, show no advantage for either type 
of group, in enjoyment, accomplishment, learning, or in the com- 
posite value judgment. 

A fourth molar hypothesis is that those students who find them- 
selves in a group which is generally able and well-advanced will 
get more stimulation and will find more value in their group than 
will be the case for students who happen to draw a group made up 
of the less competent. One indication of competence is score made 
at beginning of the term on a test of thirty items central to the 
course content. 

Hypothesis 4. Level of satisfaction with the group will vary with 
the group’s average level of competence in course knowledge.— Ac- 
tually ‘high value’ ratings came from members of groups which 
averaged 22.1 on the pre-test; ‘moderate value’ members belonged 
to groups averaging 22.4; while ‘low value’ members came from 
groups averaging 21.8. The differences are not statistically signifi- 
cant. The hypothesis cannot be defended with these findings. What- 
ever it was that made a group ‘good’ was not measured by the pre- 
test. Relation of satisfaction with group work to an individual’s 
own pre-test score will be considered later. (Hypothesis 8) 

We turn now to hypotheses concerning individual personality 
differences which may be expected to be associated with a liking 
for small group discussion. An obvious expectation would be that 
individuals (especially by the time they have become graduate 
students of education) themselves know whether they find group 
work profitable or not. Two hypotheses arise. 

Hypothesis 5. Students who declare in advance that they ‘usually 
prefer working as a member of a codperative group” will find their 
group experience more rewarding than will those who usually “prefer 
working individually’. 
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TABLE IV. Tyre or Group PROGRAM AND VALUE RaTING* 


The Journal of Educational Psychology 
































Single Teen) Verioms | Tota 
A. Enjoyment 
1) Marvelous group; warm fellowship; 28 36 64 
high enjoyment 
2) Enjoyed group; good group spirit; fine 106 115 221 
people 
3) Enjoyed certain individuals; not so 22 20 42 
much the group as a whole 
4) Group only fair; sessions dull; little 3 7 10 
group feeling 
5) Did not enjoy the group experience 2 1 3 
161 179 340 
. Accomplishment 
1) Very proud of extraordinary achieve- 3 4 7 
ments 
2) More than most other groups 49 49 98 
3) About average 89 107 196 
4) Less than average 12 13 25 
5) Nothing but talk 6 7 13 
159 180 339 
. Learning from group 
1) As educative as any experience I ever 34 34 68 
had 
2) Learned more than in most courses 69 73 142 
3) Learned about as much as in average 36 51 87 
course 
4) Learned less than in average course 17 17 34 
5) A complete waste of time so far as new 4 2 6 
learning is concerned 
160 177 337 
. Composite value judgment 
Rated high in value 25 27 52 
Rated moderate in value 111 76 187 
Rated low in value 24 27 51 
160 130 290 





* Distributions not significantly different 
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Hypothesis 6. Students who report in advance that they usually 
take a leading, active réle in group discussion will place higher value 
upon group work than will those who are usually reticent and let 
others do the talking. 

The data fail to support either hypothesis. The response “Prefer 
working as a member of a coéperating group”’ was given at the be- 
ginning of the course by seventy-two per cent of those who rated 
their group experience low in value, and by seventy-four per cent 
of those who rated it high. As shown in Table V those who claimed 
an active réle in discussions were no better satisfied than were 
those who declared their participation to be average or usually 
deficient. Apparently we won’t get far toward meeting student 


TaBLE V. Usuat ROLE 1In Group Discussion AND VALUE RaTING* 














Value Rating for Group Work 
Usual réle 
High Moderate Low All 
Active; take a lead 14 49 12 75 
About average participation 37 166 35 238 
Reticent; let others talk 2 26 4 32 
53 241 51 345 

















* Distributions not significantly different. 


needs by following their advance expression of preference for group 
versus individual work. 

Further light on the weight to be given to preference expressed 
in advance, comes from analysis of student reactions to the two 
types of group procedure. They were asked in advance whether 
they preferred “a group which takes up a different problem each 
week touching on many areas during the term’’, or “a group which 
concentrates on one problem area for the entire term, giving a 
more thorough grasp of a limited topic.’”’ We have already reported 
that the two types of group proved equally satisfying, but the 
question here concerns the relation of advance preference to even- 
tual satisfaction. Student preference was ignored in making the 
group assignments, an arbitrary procedure which aroused protest 
from a few students, and which was defended as necessary for the 
larger experimental design. Two hypotheses appear plausible in 
this connection: 
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Hypothesis 7. Students who prefer the wider variety of issues are 
more likely to favor group discussion; students who prefer to concen- 
trate and do more thorough work on a single problem are less likely to 
be satisfied with a discussion group. 


TABLE VI. Tyre or Group PROCEDURE PREFERRED AND VALUE RATING 
GiveEN Group WorkK* 





Preference 





Strongly prefer various topics 
Mildly prefer various topics 
Don’t care 

Mildly prefer single project 
Strongly prefer single project 


Value Rating for Group Work 




















High Moderate Low All 
22 97 14 133 
19 62 18 99 

0 9 2 11 
10 37 5 52 
4 34 | 10 48 
5 | 239 | 49 | 343 








* Distributions not significantly different. 


TABLE VII. ASSIGNMENT WITH OR AGAINST EXPRESSED PREFERENCE AND 
SUBSEQUENT VALUE RatTING GIivEN Groupe WorK 





Relation of Assignment to Previous Choice 


Value Rating for Group Work 














High | Setints Total 
| se aoe eae 
In accord with strong preference ll | 48 15 74 
In accord with mild preference 18 42 16 76 
Expressed no preference 0 | 9 2 11 
Counter to mild preference 13 31 6 50 
Counter to strong preference 10 51 13 74 
52 181 52 285 














* Distributions not significantly different. 


Hypothesis 8. Students assigned to the type of group they prefer 
will have a more satisfying experience than will students assigned in 


contradiction to their preference. 


As reported in Table VI, there is only very weak support for 
the seventh hypothesis. A difference in the expected direction ap- 
pears only in the categories of strong preference and the cases 


there are so few that statistical confidence is unwarranted. 


From Table VII it may be observed that the distribution of 
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satisfaction among the seventy-four students assigned in accord 
with a strong preference is almost identical with the satisfaction 
distribution among those assigned counter to a strong preference. 
Hence Hypothesis 8 must be rejected. 

Another source of data concerning individual personality charac- 
teristics is a check-list on which students, at the beginning of the 
term, registered the emphasis they wished to give to each of eight 
proposed course topics. Previous investigations have shown that 
some persons are prone to accept doubtful statements while others 
more often reject such statements or remain uncertain. It seemed 
possible that analogously there might be some students prone to 
accept and to approve whatever the teacher offered, and others 
generally negatively disposed. The former might be expected to 
respond to many proposed topics with enthusiasm and also to re- 
port favorably on their group experience. The negativists would 
similarly be skeptical of the value both of course topics and of group 
work. Hence: 

Hypothesis 9. Group value ratings are positiwely correlated with 
ratings of course topics for anticipated value-—Before the data are 
presented, three other hypotheses will be introduced, since these 
depend upon the same preliminary check-list. 

Hypothesis 10. Group-prone personalities will give higher advance 
rating to the topic, ‘“‘Procedures in group leadership’. 

Hypothesis 11. Individualists have more difficulty in inter-per- 
sonal relations; this will be expressed in higher advance ratings for: 
“Overcoming inferiority or inadequacy feelings’; ‘‘Fears and anz- 
ieties’’; ‘Sex adjustment”’ ; and ‘‘Psychotherapy’’. 

Hypothesis 12. Students who express more interest in proposed 
topics such as techniques for discipline, testing and guidance, reveal 
a manipulative attitude toward others and may be expected to rate 
group work lower. 

The data of Table VIII fail to support most of these hypotheses. 
Hypothesis 9 fails since, in the last two lines of the table it appears 
that the average student gave 3.2 single checks and 1.45 double- 
checks and that this holds true of all groups with no significant 
deviation. 

The topic of “Procedures in group leadership” was ranked in 
sixth place by the group-prone persons and in fourth place by the 
individualists; a difference counter to our hypothesis but not statis- 
tically significant. 
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TaB.e VIII. Topics PREFERRED ror EmMpHASIS IN CouRSE 





Topics (In order of preference) 





Value Rating for Group Work 











Discipline: how com- 
bine order and free- 
dom in practical class 
situation 

Techniques of guidance 
and counseling 

Overcoming feelings of 
inferiority or inade- 
quacy in inter-per- 
sonal relations 

Fears and anxieties; 
how they arise and are 
removed* 

Procedures 
leadership 

Value of psychotherapy 
for personal develop- 
ment 

Problems of sex adjust- 
ment in the modern 
world 

Tests (intelligence, ap- 
titude, interests, etc.) 
useful in studying pu- 
pil needs 


in group 


87 | 4 


89 | 3 


98 |} 1 


69 | 5 








92 1 


63 6 


67 5 


57 8 


59 7 








Av. no. of double 
checks 


Av. no. of single checks 





1.44 





3.19 


1.46 


3.30 





1.45 


2.92 





| 


89 


73 


72 





3.23 





* Chi Square test of distribution of checks and double-checks on topic 
of fears and anxieties shows difference among groups significant at .01 level. 
No other significant differences. 


On the three items related to possible emotional difficulty in 
inter-personal relations only one (‘‘value of psychotherapy’’) is in 
the direction of our hypothesis, and the one difference significant 
at the .01 level of confidence shows the group-prone personalities 
giving higher rating to study of “fears and anxieties; how they 
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arise and are removed’’. The results suggest that, quite contrary 
to our original hypothesis, concern over inferiority, inadequacy in 
personal relations, fears and anxieties may lead to greater satis- 
faction in the experience of being an accepted group member. 

The findings on “manipulative approach” do not reveal any 
significant difference. ‘“Testing” is given a low rating by both 
‘High’s’ and Low’s’ in attitude toward group experience; ‘Disci- 
pline” rates high for both groups and what slight difference there 
isruns counter to our hypothesis, but is not statistically significant. 
The difference on “Guidance and Counseling” is even smaller. 
None of our hypotheses concerning personality factors in prefer- 
ence for group experience is substantiated by these data on choice 
of topics. 

Further exploration of personality and background possible on 
the basis of a pre-test of thirty questions given to all students at 
the opening of the course. The first six of the thirty questions were 
chosen as ‘best items’ from the Levinson-Sanford F Scale, measur- 
ing what has come to be known as the ‘Authoritarian Personal- 
ity’. Hypotheses 13-17 can be checked against corresponding an- 
swers on this Pre-Test. 

Hypothesis 13. High scores on the F scale items will indicate stu- 
dents who do not easily adapt to coéperative, democratic relationships 
and hence will be associated with dissatisfaction with group experience. 

The results on the brief F scale, as shown in Table IX do not 
accord with Hypothesis 13. The association (.02 level of confidence) 
is in the opposite direction. The few strongly authoritarian scores— 
there were only sixteen cases— concentrated largely in the ‘Mod- 
erately well satisfied’ category. The low authoritarians (zero score) 
were distinctly more apt to be found in the group least well satis- 
fied with their group experience! 

Other items on the pre-test were grouped by advance inspection 
to permit the testing of hypotheses related to dependence of effec- 
tive group work on sensitivity to emotional needs of others; ab- 
sence of hostility; low self-reliance; and low ‘intellectualism’. 

Hypothesis 14. Seven pre-test items apparently indicative of sensi- 
tivity toward and concern over the emotional needs of others will iden- 
tify students who rate group work higher. 

Hypothesis 15. Four pre-test items apparently indicative of hos- 
tility and distrust toward others will identify students who rate group 
work lower. 
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Hypothesis 16. Two pre-test items apparently indicative of indi- 
vidualism and self-reliance will identify students who rate group work 
lower. 

Hypothesis 17. Five pre-test items apparently indicative of ‘intel- 
lectualism’ will identify students who rate group work lower. 

The items pertinent to Hypothesis 14 were: 


a) ‘Feeding a child whenever he is hungry contributes to a sense of 


confidence in the world.”’ 
Agree: seventy-three per cent of ‘High Value’; sixty per cent of ‘Low 


Value’ (Distributions not significantly different) 


TABLE IX. ‘AUTHORITARIAN CHARACTER’ AND VALUE RATING GIVEN 
Group WorkK 

















Value Rating for Group Work 
‘Authoritarian Score’ an 
High | Moderate Low Total 
High authoritarian (4-5)* 0 14 2 16 
3* 5 21 3 29 
2 16 47 § 69 
1 21 66 18 105 
Non-authoritarian 0 ll 37 23 71 
53 185 52 290 














X? = 15.44 significant at .02 level. 
* For Chi Square computation, scores of 3, 4 and 5 were combined in a 


single category and Yates’ correction applied to the smallest theoretical 
frequency. There were no scores of 6—the maximum. 


b) ‘‘Behavior may be affected significantly by what a person believes 

to have happened even though the event never took place.” 
Almost no disagreement. 

c) ‘‘During the first year or two of life a child is called upon to surrender 

psychological autonomy in favor of culturally prescribed patterns.’’ 
Agree: fifty-four per cent of ‘High Value’; sixty-five per cent of ‘Low 
Value’ (Distributions not significantly different) 

d) ‘‘Each child acquires a unique version of the broader culture; his 
private world is idiomatic.”’ 

Agree: fifty per cent of ‘High Value’; sixty-three per cent of ‘Low 
Value’ (Distributions not significantly different) 

e) ‘‘An educator who gave a present instead of punishment to a child 
who had been stealing, probably made that child’s delinquent behavior 
more likely in the future.’’ 

Disagree: sixty per cent of ‘High Value’; fifty-six per cent of ‘Low 
Value’ (Distributions not significantly different) 








« 


if 
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f) ‘‘Adolescent attitudes are more commonly influenced by the peer 
group than by the parent.”’ 
Agree: seventy-seven per cent of ‘High Value’; eighty-three per cent 
of ‘Low Value’ (Distributions not significantly different) 
g) ‘‘Aggression almost invariably arises as the result of some frustra- 


tion.”’ 
Agree: eighty-one per cent of ‘High Value’; seventy-one per cent of 
‘Low Value’ (Distributions not significantly different) 


Combining the seven items which appear to involve some sym- 
pathetic awareness of the emotional needs of others, we obtain 
practically identical scores for those who rated their group experi- 
ence ‘High’ and those who found it of much less value. Hypothesis 
14 is clearly not supported by these data. 

To test Hypothesis 15, we inspect responses to the following 
items. 


she a) ‘The world is a hazardous place in which men are basically evil and 
dangerous.”’ 
Agree: two per cent of ‘High Value’; two per cent of ‘Low Value’ (Dis- 
tributions not significantly different) 

b) ‘Young people today are much ‘wilder’ than they used to be.”’ 
Agree: thirteen per cent of ‘High Value’; four per cent of ‘Low Value’ 
(Distributions not significantly different) 

c) ‘Most pupils prefer to be lazy and will exert themselves on difficult 

tasks only under adult encouragement or pressure.”’ 
Agree: nine per cent of ‘High Value’; thirteen per cent of ‘Low Value’ 
(Distributions not significantly different) 
d) ‘‘Human nature the world over exhibits very much the same com- 
petitive attitudes regardless of differences in social ideals or customs.’’ 
Agree: fifty-four per cent of ‘High Value’; fifty-seven per cent of ‘Low 
Value’ (Distributions not significantly different) 


Again, no significant differences appear. The ‘Low Value’ stu- 
dents in this class seem not to be of the type which projects hos- 
tility onto ‘people in general’. 

How now about self-reliance? We have already reported that 
those who usually prefer to work alone rather than as a group 
member do not give lower ratings to their group experience. Two 
additional items bear on the same hypothesis. 


a) ‘The self-reliant, completely independent person should be our edu- 
cational objective.”’ 
Agree: fifteen per cent of ‘High Value’; eight per cent of ‘Low Value’ 
(Distributions not significantly different) 
b) ‘Heredity determines within narrow limits the pattern of behavior 
possible to any individual.” 
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Agree: thirty-seven per cent of ‘High Value’; thirty-eight per cent of 
‘Low Value’ (Distributions not significantly different) 


No support for Hypothesis 16 can be found in these answers. 

The last of the hypotheses on personality (#17) dealt with 
‘Intellectualism’ in education as associated with rejection of learn- 
ing as a group member. The following responses are pertinent. 


a) ‘‘The ideal aim of education is to enable the individual to subordinate 
his emotion to his intellect.’’ 
Agree: twenty-five per cent of ‘High Value’; thirteen per cent of ‘Low 
Value’ (Distributions not significantly different) 
b) “While nursery schools and kindergartens may be attractive to 
children or to parents, very little of educational importance can be achieved 
before the child is six.’ 


None agree. 
c) ‘‘Many social attitudes depend much less upon the schooling which 


one has had than upon his membership within the class structure of his 


community.”’ 
Disagree: four per cent of ‘High Value’; fourteen per cent of ‘Low Value’ 


(Distributions not significantly different) 
d) ‘*Verbal skills which tend to be significantly related to success in 
schools are very slightly related to success in non-school activities.”’ 
Disagree: fifty-eight per cent of ‘High Value’; sixty-seven per cent of 
‘Low Value’ (Distributions not significantly different) 
e) ‘‘Persons showing high agreement with staff responses to these items 
will not be identical with persons who can deal most effectively with prac- 


tical educational problems.”’ 
Disagree: thirty-four per cent of ‘High Value’; thirty-two per cent of 
‘Low Value’ (Distributions not significantly different) 


Hypothesis 17 on ‘intellectualism’ is not borne out by these 
items; there are no differences large enough for confident prediction 
and on two of the five statements there is slightly more ‘intel- 
lectualism’ apparent in ‘High’ group than in the ‘Low’. 

On the basis of the ‘Individualism’ and the ‘Intellectualism’ hy- 
potheses, we might expect the students who do not care for group 
work to give noticeably higher ratings to lectures, individual read- 
ing assignments and to writing their own professional diary. Re- 
turning for a moment to Table II, where ranks on these activities 
were reported, we discover that, after we have corrected for the 
effect upon other ranks of the high and low estimate given to 
‘Group Discussions’, the other course activities stand in about the 
same order of preference. Largest differences are a liking for films 
by the ‘Low Value’ group, and approval of the professional diary 
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by the ‘High Value’ group. Neither of these deviations would sup- 
port the hypotheses that the group-resistant students were more 
bookish or more interested in independent work. 

Consistently we have been disappointed in our quest. Specific 
answers on the pre-test do not provide the least support for ap- 
parently plausible hypotheses concerning the persons who will 
profit most from small group participation. A possible explanation 
is that the items chosen are too few and too slightly related to the 
trait name assigned them. It is clearly not possible to obtain a 
highly reliable and valid measure of personality traits when using 
only three or four self-report items, none of which has been vali- 
dated. Perhaps the test as a whole will be more indicative than 
its parts. We hypothesize: 

Hypothesis 18. Students with a superior background in modern 
psychology and education will be better able to profit from small group 
discussions in a course in educational psychology.—Two kinds of 
evidence bear on this hypothesis. One is the score on the pre-test; 
the other the total credit-hours from previous courses in psychology 
and education. 

Table X indicates no significant difference between the ‘High 
Value’ and ‘Low Value’ individuals on their Pre-Test scores. 
Amount of previous study in courses in psychology and education 
ranged from none to over sixty credit hours. Table XI indicates 
that this factor also was unrelated to evaluation of group experi- 
ence. 

The next group of hypotheses concern differences in the particu- 
lar aspects of group experience found especially satisfying or frus- 
trating. All students ranked in order of importance five factors 
contributing to the success of their group and ten factors which 
might have been detrimental. Table XII shows that, for the class 
as a whole, the most valuable thing about their group experience 
was ‘Stimulation of ideas coming from others’, while ‘Chance for 
self-expression’ was rated least important to them. 

What differences might be expected between ‘High’s’ and 
‘Low’s’? Since all items were ranked by all students the results 
will not reflect general level of satisfaction but only relative differ- 
ences among the five satisfying aspects. It might be hypothesized 
that when a group goes well, little attention is given to technique. 
When difficulties arise, process is more closely analyzed. Hence: 

Hypothesis 19. Students rating their group ‘Low’ in value will 
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attribute relatively more of such value as it has, to “learning about 
group process’. 

A correlative hypothesis would be that a by-product of good 
group discussion is a warm sense of belonging and togetherness. 
Hence it might be anticipated that: 


TABLE X. PRE-TEsST ScoRES AND VALUE RaTING GIVEN 
Group Work 














Value Rating for Group Work 
Pre-test Scores 

High Mod Lo 
Value Value Value Total 
26-30 4 12 10 26 
21-25 27 65 24 116 
16-20 19 79 14 112 
15 or less 3 29 4 36 
53 185 52 290 

















(Distributions not significantly different.) 


TaBLE XI. Previous Stupy oF PsycHOLOGY AND EDUCATION IN RELATION 
To VALUE RaTING GivEN Group WorK 














Value Rating for Group Work 
Credit Hours in Psychology and Education oa 
High | Moderate | Low All 
0-6 4 20 3 27 
7-20 14 62 12 88 
21 and over 36 156 34 226 
54 238 49 341 

















(Distributions not significantly different.) 


Hypothesis 20. Students rating their group ‘High’ in value will 
attribute relatively more of its worth to “enjoying group fellowship.” 

Examination of the data in Table XII does not bring much sup- 
port for either hypothesis. On “learning about group process” the 
‘Low’s’ do provide sixteen ‘First’ rankings to only seven ‘First’s’ 
from the ‘High’s’, a difference 2.3 times its standard error. ‘Sec- 
ond place’ rankings, however, reverse the direction; and for the 
distribution as a whole p falls between .20 and .10. 

On “enjoying group fellowship” the distributions do not differ 


significantly. 
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The one difference which does emerge clearly, in Table XII, con- 
cerns ‘Stimulation of ideas coming from others”. This is in first 
place for the total group but ranks fourth among the five factors 


TABLE XII. Factors ContrisutTina Most to VALUE or Group 




















EXPERIENCE 
Value Rating for Group Work 
Stimulation of ideas coming from others 
High Moderate Low All 
Rank Order 
1 16 61 7 84 
2 20 74 10 104 
3 7 32 13 72 
4 8 28 8) 45 
5 4 24 11 39 
55 | 230 | 50 | 344 











(Differences significant at better than .01 level.) 





Getting to know people of different bechasvende! 








Rank Order 
1 9 63 13 85 
2 12 49 17 78 
3 17 39 10 66 
4 10 54 10 74 
5 7 34 1 42 
5 239 51 345 

















(Distributions not significantly different.) 











Enjoying group fellowship 
Rank Order 

1 16 41 10 67 
2 9 54 9 72 
3 12 75 12 99 
4 14 44 13 71 
5 4 23 7 34 

55 237 51 343 

















(Distributions not significantly different.) 
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Learning about group procedure 
Rank Order 

1 7 44 16 67 
2 11 37 7 55 
3 8 34 7 49 
4 9 57 10 76 
5 9 67 11 97 

54 239 51 344 

















(Differences not significant: p>.10 and <.20.) 














Chance for more self-expression 
Rank Order | 

1 8 27 4 39 
2 3 23 8 34 
3 11 39 9 59 
4 12 54 9 75 
5 20 95 20 135 

54 238 50 342 














(Distributions not significantly different.) 


according to ratings from the ‘Low’ group. One thing most of them 
meant by their ‘low’ rating was that they didn’t get much intel- 
lectual stimulation from their fellows. This finding supports the 
validity of the value rating and suggests a possible reason why our 
earlier hypotheses concerned with emotional readiness for group 
fellowship were not sustained. The instructors presented the group 
as offering opportunities for satisfying experiences in interper- 
sonal relations; the students however persevered in the traditional 
set of seeking useful information from their classmates. Finding 
that the other group members did not contribute much in the 
line of intellectual stimulation, some students apparently regarded 
their group experience as disappointing. It should be remembered, 
however, that those who, on the basis of low pre-test score and 
few previous courses, might have been expected to have most to 
learn from their fellows were not significantly more appreciative. 
Turning now to rankings accorded factors which hindered the 
group work, the over-all results (Table XIII) show the students 
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TaBLeE XIII. Facrors DETRIMENTAL TO Group EXPERIENCE 
Value Rating for Group Work 
Detrimental factors ieee aaahieal es 
High | Moderate | Low | All 
1. Inexperience in group process | | 
Rank: 1 or 2 144 | 85 22 121 
3 or 4 is | 69 15 102 
5 or 6 s | 37 8 53 
7ors 7 | 31 4 42 
9 or 10 5 | 15 | 2 | 22 
52 | 237 | 51 | 340 
(Distributions not significantly different.) 
2. Lacked solid content 
Rank: 1 or 2 7 67 17 91 
3 or 4 8 59 17 84 
5 or 6 21 56 9 86 
7 or 8 8 37 5 50 
9 or 10 7 18 3 28 
51 237 51 339 
(Distributions different at better than .01 level of significance.) 
3. Poor physical setting 
Rank: 1 or 2 14 64 6 84 
3 or 4 15 37 8 60 
5 or 6 5 43 10 58 
7 ors ll 49 15 75 
9 or 10 6 42 10 58 
51 235 49 335 
(Distributions different with p>.01 and <.02.) 
4. Inadequate leaders in group 
Rank: 1-2 5 44 20 69 
3-4 14 50 13 77 
5-6 8 58 12 78 
7-8 14 50 5 69 
9-10 10 34 1 45 
51 236 51 338 




















(Distributions different at better than .01 level of significance.) 
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TaBLE XIII—Cont. 




















Value Rating for Group Work 

Detrimental factors I. icattnainedepiaaliidiiagdatdastiibinlitdbinitanane 
| High | Moderate | Low | All 

5. Program too crowded | | | 
Rank: 1-2 16 52 a 74 
3-4 10 | 55 5 | 70 
5-6 13 | 61 | 12 | 86 
7-8 10 | 43 20 73 
9-10 3 | 28 6 | 34 
52 | 236 | 49 | 387 








(Distributions different at better than .01 level of significance.) 






































6. Too little talk by some | | 
Rank: 1-2 9 | 40 8 | 57 
3-4 16 | 69 11 | 96 
5-6 8 | 67 | 17 | 92 
7-8 3 | 46 | 1 | 70 
9-10 | 5 | M4 | 8 | 2 

| — 
51 | 236 | 50 | 337 
(Distributions not significantly different.) 

7. Too much talk by some | | | | 
Rank: 1-2 10 | 41 | 10 | 61 
3-4 5 | 563 | 14 | 72 
5-6 | wd | 61 16 | 90 
7-8 17 | 55 9 | 81 
9-10 6 | a | 2 | 3 

| | | 
51 | 237 | 51 | 339 

(Distributions not significantly different.) 

8. Meeting time too short | | 
Rank: 1-2 22 CO 40 0 | 62 
34 CO 9 | 32 6 47 
5-6 11 28 4 43 
7-8 5 48 13 66 
9-10 3 86 26 115 
50 | 234 | 49 | 9333 











(Distributions different at better than .01 level of significance.) 
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TaBLeE XIII—Cont. 


























Value Rating for Group Work 
Detrimental factors ——s 
High | Moderate | Low | All 
9. Too little staff direction 

Rank: 1-2 4 71 11 86 
3-4 7 64 10 81 

5-6 8 47 | 10 65 

7-8 ; a | 40 | 1 62 

9-10 | 20 | ll | 7 38 

| 
50 | 233 | 49 332 








(Distributions different at better than .01 level of significance.) 














10. Too much staff direction | 
Rank: 1-2 5 10 2 17 
3-4 1 8 2 11 
5-6 5 12 3 20 
7-8 4 35 6 45 
9-10 35 167 36 | 238 
50 | 232 | 49 | 331 














(Distributions not significantly different.) 


inclined to place most blame on their own “inexperience in group 
process’’. 

The second criticism in order of importance was “‘not enough 
solid content in the discussions’’. Students on the whole did not 
feel that there had been too much or too little staff direction. Al- 
though training for group discussion often emphasizes the handi- 
cap imposed by the member who talks too much or too little, 
these were not felt by this class to be major liabilities. 

Differential rankings of the ten potential liabilities to good group 
functioning may be analyzed with reference to a hypothesis that 
those well satisfied will play down the vital criticisms and play up 
the more external and less significant liabilities, while those dis- 
satisfied will regard the vital and significant factors as major lia- 
bilities. More specifically: 

Hypothesis 21. Members who rate group value as ‘Low’ will give 
relatively more emphasis to charges that the discussion lacked solid 
content and good leadership; those who rate group value as ‘High’ 
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will give top rank to such ‘external’ criticisms as ‘‘Meeting time too 
short’’, “Physical setting poor’’, and ‘‘Schedule too crowded’’. 

The data of Table XIII fully support this hypothesis. There were 
statistically significant differences on all of the items mentioned, 
and consistently in the predicted direction. 

A few more hypotheses may be derived from consideration of 
certain personal data. For instance, the anthropologist Ashley 
Montagu in recent articles has argued that women are basically 
less aggressive and more codéperatively disposed than are men. So 
we might predict that: 

Hypothesis 22. Women will value their group experience more 
highly than will men. 


TaBLE XIV. Sex DIFFERENCE AND VALUE RatTiInG GivEN Group WorkK 




















Value Rating for Group Work 
‘High | Moderate | Low Total 
Men 42 132 22 196 
Women 13 111 29 153 
55 243 51 | 349 











(Distributions different at .01 level.) 


The data on Table XIV reveal a statistically significant differ- 
ence, but the difference shows a higher proportion of men in the 
category of ‘High approval’. Our sex-difference hypothesis must be 
reversed. Why were the men better satisfied? One possibility would 
be that when groups are formed, as these were, by a process which 
leaves it to students somehow to group themselves, the men take 
more initiative and are less apt to be drawn in with unwanted 
companions. Another factor may be the greater deference shown 
to the males by members of a mixed group in our culture. Still 
another explanation may be that, contrary to the stereotypes of 
humor, men in such groups talk more than women do. Our data do 
not permit us to check the plausible hypothesis that level of satis- 
faction reflects the level of participation. 

There remains the variable of age and experience. Some students 
were fresh out of college; others had taught for more than twenty 
years. Should we expect a difference in attitude toward group dis- 
cussion among students at widely different stages of professional 
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maturity? We argued that students just beginning their teaching 
would probably feel they were getting more out of conferring with 
experienced colleagues than the older teachers would think they 
could get from youth. Moreover, if small group work may be con- 
sidered still to be an innovation, it would be more readily accepted 
by the younger students. The pattern of respectful listening to 
professional authority may be more firmly established in older 
teachers. Hence we predict that: 

Hypothesis 23. Students with no professional experience are more 
likely to value group work highly; those with long practical experience 
are more likely to reject or to disparage the small group discussions. 


TABLE XV. YEARS OF TEACHING EXPERIENCE AND VALUE RaTING GIVEN 
Groupe WorK 














Value Rating for Group Work 
Years of Experience 

High Moderate Low Total 

None 22 64 6 92 

1-5 years 23 93 21 137 

6-10 years 2 45 15 62 
11-20 years* 6* 28* 8* 42* 
More than 20 years* 0* 10* i* 11* 

53 240 51 344 

















(Distributions significantly different at .01 level.) 
* Combined for Chi Square test. 


The data of Table XV are in accord with the hypothesis. As 
many as forty-two per cent of the ‘High Value’ group were with- 
out any previous teaching experience, as compared with only twelve 
per cent of the ‘Low Value’ participants. The same conclusion is 
supported also by other break-downs showing association between 
‘Low Value’ ratings and having present full-time employment, 
mainly as a teacher (p > .02 and < .05), and also between ‘Low 
Value’ ratings and having as professional goal administration or 
supervision rather than teaching (p < .01). The group work was 
apparently most helpful to full-time, younger students, inexperi- 
enced in teaching, and eager to profit from the broader experience 


of others. The experience factor, for reasons not clear, was oddly. 


sex-linked. High ratings came predominantly from inexperienced 
men; low ratings from women with more than five years teaching 
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experience. Proportion of high-low was about 50-50 among experi- 
enced men or among inexperienced women. 


SUMMARY AND INTERPRETATION 


Probably the most striking revelation of this report is the extent 
of mistaken anticipations in the writer after a quarter century of 
organizing, training and observing small group discussions as part 
of large graduate courses. Of our twenty-three logically plausible 
hypotheses only two were confirmed; seventeen collapsed for want 
of factual support, and four had to be reversed in whole or in part. 

Students who will enjoy and profit from small group participation 
could not be identified on the basis of; their own expressed prefer- 
ence; their leve! of mastery of the course material; their stated in- 
terest to learn about ‘group leadership’; their general level of en- 
thusiasm for course topics; or their responses to clusters of 
questions apparently indicating sympathy, hostility, self-reliance, 
or ‘intellectualism’. No advantage was demonstrated for groups 
which worked cumulatively on a single topic all semester over 
groups which discussed different issues each week. 

Obtained differences which were statistically significant demon- 
strated that: 

1) individual-wide variables accounted for value ratings better 
than did group-wide variables. Few groups were homogeneously 
regarded as good or poor; 

2) the best groups were larger (ten—fifteen members) than aver- 
age (eight); 

3) Students especially interested to learn about ‘fears and anx- 
ieties’ tended to place high value on group work; 

4) Students who rejected all items indicative of ‘authoritarian- 
ism’ placed a low value on group work; 

5) Students who rated groups low were disappointed mainly in 
lack of intellectual stimulation from their fellow-members; 

6) Men, with little or no professional experience in the field, 
were responsible for more ‘High’ ratings; women with more than 
five years of experience gave more ‘Low’ ratings. 








ELEVEN-YEAR-OLD BOYS IN TROUBLE! 


WILLIAM W. WATTENBERG 


Wayne University 


Among workers in the field of juvenile delinquency, we find 
much attention given to three common but not mutually exclusive 
explanations of why youngsters fall into patterns of misconduct. 
Sociologists have tended to concentrate upon factors linked to 
social or economic variables. Such workers as Shaw and his col- 
laborators (5) have painstakingly produced evidence showing that 
delinquency rates are highest in neighborhoods typified by poverty, 
low status, poor housing, and cultural conflict. According to their 
view, delinquency is a relatively normal reaction of young folks to 
bad situations. 

A second line of evidence receives more attention from psychol- 
ogists and psychiatrists. They have found that among delinquents 
there are many who prove to be victims of emotional conflict or 
instability. Healy and Bonner (3) have shown that where delin- 
quent youngsters had non-delinquent siblings, presumably ex- 
posed to the same social environment, the delinquents differed 
from their siblings in being emotionally maladjusted. More re- 
cently, Redl and Wineman (4) have pictured in great detail the 
personality disorganization in a group of very seriously delinquent 
boys. 

A third type of causal situation, less fully explored by research 
workers, has been stated by Washburne (6). He sees delinquency 
as often a product of a child’s inability to apply judgment to the 
control of his impulses. As this could be a temporary condition, 
which would change itself as normal maturing brought increased 
power of judgment, it can account handily for the fact that many 
boys and girls engage in serious misconduct for a while, later cor- 


1'The author wishes to express his indebtedness to Senior Inspector 
Sanford Shoults, Inspector Ralph Baker and Lieutenant Francis Davey of 
the Youth Bureau, Detroit Police Department, for their codperation in 
making available the records upon which this study is based. Appreciation 
is also due to Dean Waldo Lessenger, of the College of Education, Wayne 
University, for solving administrative problems permitting completion of 
the research. 
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rect their anti-social tendencies without benefit of psychotherapy 
or neighborhood reform. 

Clearly, these three explanations of delinquency need not be 
mutually exclusive. Each could be operating either alone or in 
combination for some cases. In the total mass of juvenile delin- 
quents we could reasonably expect to find some with seriously 
disturbed personalities, some normal youngsters who had learned 
patterns of misconduct by living in unfortunate social settings, and 
some who had committed misdeeds because unaware of the full 
implications of their acts. 

The focus of the present study is upon very young boys 
in trouble. In terms of the now popular terminology of develop- 
mental phases, they are preadolescents. This group was chosen 
because it has received relatively little systematic study and be- 
cause its characteristics could be expected to throw additional 
light on the genesis of delinquency. 

In their summary of research on preadolescents, Blair and Burton 
(1) point out a number of qualities which might lead to delinquency 
taking somewhat different patterns from that found in older age 
groups. In terms of attitudes toward adults, the preadolescents 
might be more likely to display ambivalence. Among boys, the 
search for masculine identification objects gives greater power to 
gang codes. They are given to seeking models for behavior among 
the next older age group. Their immaturity at a time when impulses 
are rising in power is likely to lead to periods when their value sys- 
tems are somewhat hazy and weak. 

Related to the three lines of explanation previously mentioned, 
we would predict the following factors as likely to be found among 
preadolescents involved in serious misconduct: Because of their 
tendency to ape older adolescents, we would expect that bad neigh- 
borhood conditions, in which delinquency was part of the subcul- 
ture, would be very influential. Because of the temporary weak- 
ness of their value systems, we would expect to find in the total 
group a somewhat higher proportion of relatively normal young- 
sters than among an older group of delinquents. We should also 
expect to find some young folks giving evidence of serious malad- 
justment, although they would be present in less heavy a concen- 


tration. 
In a study of factors related to repeating, reported elsewhere 
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(8), the author had noted that among eleven-year-olds with police 
records repeating was more highly predicted by school grades and 
by the nature of gang activities, and less highly predicted by 
family conditions and social variables than among delinquents 
unselected as to age. This finding, however, did not give an answer 
as to the relative frequency in the total group of youngsters upon 
whom the several influences associated with delinquency might 
be operating. 


PROCEDURE 


It was decided to compare a group of eleven-year-olds known to 
police, as the result of complaints, with a similar group who had 
passed their twelfth birthdays. The age eleven was chosen because 
it was believed a sufficiently large number of cases would be found 
to permit statistically reliable computations and because the re- 
sults of research such as the Harvard Growth Study (2) indicated 
that few eleven-year-old boys have advanced far in pubescence. 
In the sample of seven hundred and forty-seven boys none had 
completed his preadolescent growth spurt before 11.50 years of 
age, and only thirteen in the 11.50—-12.49 annual period. 

Although the eleven-year-olds could not be considered a pure 
sample of preadolescents nor the older group, of adolescents, yet 
we can say safely that the younger sample would contain a very 
heavy concentration of preadolescents and the older sample would 
be much more strongly saturated with adolescents. 

The files for 1949 of the Youth Bureau of the Detroit Police 
Department were searched and the records of all boys more than 
eleven and less than seventeen years old obtained. There was a 
total of 4,121 such records. On the basis of age of boy at the time 
of police contact these were divided into two groups, consisting 
of 334 eleven-year-olds and 3,787 boys past their twelfth birthdays. 

For every boy there was already in the file a ‘history sheet’, filled 
out by specially assigned officers on the basis of interviews with 
the boys, visits to his home, and their own knowledge of conditions 
in the districts which they regularly covered. These sheets con- 
tained some forty-two items of fact or rating concerning the boy, 
his housing, his school, his family, and his neighborhood. For every 
item, a tabulation was prepared and the chi-square calculated. 

The data on the history sheets have been used in a number of 
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previous studies (7, 9) and found to be reliably predictive of re- 
peating and to yield valid discriminations between gang members 
and non-members. 


FINDINGS 


The extent of differences between the eleven-year-olds and the 
older boys is indicated by the fact that of the forty-two chi-square 
calculations, in nineteen the null hypothesis could be rejected at 
the two per cent level of confidence. By chance alone, one such 
rejection could be expected to occur. Thus, the number of sig- 
nificant tables was eighteen times above chance expectation. We 
can therefore state with assurance that the eleven-year-olds as a 
group showed some marked contrasts to the older group. For 
convenience, these differences will be reported in clusters having 
elements in common. 

Socio-economic indices.—Of the data bearing on socio-economic 
factors or neighborhood conditions, the most objective was the 
ratio of rooms in the dwelling unit to number of occupants. The 
facts are presented in the upper third of Table I. This shows a 
statistically reliable difference to exist. (In this table, if the ‘not 
stated’ category is eliminated, the chi-square total is 7.6; with 
one degree of freedom, P is less than 0.01.) The eleven-year-olds 
came in larger proportion from dwellings with one room or fewer 
per occupant. Bearing out this table were a number of others. 
The police officers found more eleven-year-olds in buildings rated 
substandard; on the chi-square test this relationship was significant 
at a one per cent level of confidence. A similar trend was also noted 
in the case of type of building, mixture of business and residential 
land usage, and rated quality of neighborhood, although in these 
instances the chi-square test was inconclusive. The younger group 
came in greater proportion from rented quarters, neighborhoods 
rated ‘below average’, and areas where business establishments 
were mixed with residences. 

Family conditions.—As far as having both parents present in the 
home, our younger group proved to have the advantage. The 
middle third of Table I gives data showing that more of them came 
from intact homes; more of the older group had lost one or more 
parents by death. ‘The relationship was strong enough to warrant 
rejection of the null hypothesis at the 0.001 level of confidence. 
This relationship was supported by other tables. At the two per 
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cent level of confidence, more of the older group had working 
mothers, as contrasted to the eleven-year-olds in whose families 
the father was more likely to be the only employed parent. Among 
the tables where the chi-square test was inconclusive, the eleven- 
year-olds more often came from homes where some parent was at 
home to take care of things during the day and evening, as con- 


TaBLe I.—Home ConpitTions or Boys 























Total year oe x? df P 

Ratio of rooms to occupants in 

dwelling units 
One room or less per person 3,127 | 276 | 2,851 
More than one room per person 931 | 55 876 | 9.0 | 2 |<0.02 
Not stated 63 3 60 
Marital status of parents 
Living together 2,456 | 223 | 2,233 
Separated or divorced 971 | 81 890 | 17.6 | 3 |}<0.001 
One or both dead 645 | 26 619 
Not stated 49 | 4 45 
Method used by parents in giving 

money to boy 
Allowance 931 | 93 838 
Pay for work 98 2 96 | 64.3 | 4 |}<0.001 
On request 2,152 | 218 | 1,934 
None 809 | 18 791 
Not stated 131 3 128 | 
Total 4,121 | 334 | 3,787 | 

















trasted to homes unsupervised except at night. On the basis of 
interviews with the boy and his parents the officers’ report showed 
a statistically inconclusive tendency for the older group to have 
parents who frequently quarreled. 

Dependency of boys.—The only item giving a clue to dependency 
related to the manner in which the boys received their spending 
money. The bottom third of Table I sets forth the data on this 
point. Here it will be noted that the younger boys, with few excep- 
tions, received money from parents either as an allowance or in 
response to their requests. By contrast, the older boys included a 
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sizable minority whose parents gave them no money or else paid 
a wage for work done. This relationship was significant at the 0.001 
level of confidence whether or not the ‘not stated’ category was 
treated as a separate class. The ‘not stated’ group is reported, as 
it will be discussed later. 

Expressed attitudes toward institutions—When interviewing the 
boys, the police officers sought information indicating their atti- 
tudes toward home, school, church, and adult neighbors. Obviously, 
what the boy would say to a policeman on items where ‘good’ and 
‘bad’ were so patent must be taken with a grain of salt. As a cor- 
rection to falsifications, efforts were made to secure objective be- 
havior. In general, on both types of evidence the eleven-year-olds 
more often gave the conventionally acceptable response or showed 
more conventional behavior. In the upper half of Table 2 we give 
a mixture of both types of data relating to school. Here we note 
that the older group was more likely to openly express dislike of 
school. In addition, a significant minority had quit formal educa- 
tion. Other tabulations gave results showing a similar trend. At 
the one per cent level of confidence, more of the older group ex- 
pressed hostile feelings toward teachers and one or both parents. 
Among the statistically inconclusive tabulations, we found more 
older boys antagonistic to adult neighbors and more eleven-year- 
olds attending church regularly. 

Peer group relationships.—The younger boys appeared to have, 
as a group, better social relationships with other youngsters. Among 
the facts to which the police officers gave close attention was the 
boy’s companions. Data on this point were valued highly for 
police reasons; they had often proved useful in clearing up new 
offenses. The facts were obtained not only from the boy himself 
but from patrolmen and other adults familiar with the neighbor- 
hood. On this basis it was possible to divide the boys into two 
groups: (1) those who belonged to a crowd or a gang which played 
together and (2) ‘lone wolves’. The lower half of Table 2 presents 
the data on this point. The eleven-year-olds included more boys 
who belonged to a crowd or gang and fewer who had been classed 
as ‘lone wolves’. As with the other key tabulations reported above, 
this one also was supported by other data. At the one per cent 
level of confidence, the eleven-year-olds were reported to get along 
better and to be less given to quarrels with classmates in school. 
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Also, in their choice of favorite sports, comparatively more chose 
baseball and fewer, fishing, than the older group. 

Miscellaneous.—There were a number of other items yielding 
statistically significant differences between the two age groups. 
For the most part these either were not germane to the main pur- 
pose of this study or else were natural results of age differences. 
For the sake of completeness, they are reported herewith in sum- 
mary form: 

1) Fewer eleven-year-olds had paid employment. 


TABLE 2.—ScHoOoL ATTITUDES AND PEER Group RELATIONSHIPS OF Boys 





11- 
Total ie Older | 4s |at| P 





olds | group 
Attitude expressed toward school 
Likes 2,675 | 271 | 2,404 
Indifferent 591 | 42 549 
Dislikes 369 | 17 352 
Hates 60 2 58 | 58.1 | 4 |<0.001 
Not stated and not in school 426 2 424 








Peer group membership | 
Boy was in some sort of peer group | 3,632 | 313 | 3,319 




















‘Lone wolves’ 473 | 20 | 453 | 11.4 | 2 |<0.01 
Not stated 16 1 | 15 
Total 4,121 | 334 | 3,787 | 
| } | 





2) Police officers rated more of them small for their age. 

3) Police officers considered more of them ‘honest’. 

4) On the basis of appearance, police officers rated more of 
them ‘preadolescent’. 

5) More eleven-year-olds spent all their money on entertain- 
ment. 

6) None was allowed to drive the family car. 


DISCUSSION 


On one of the hypotheses, the results were unequivocal. As in- 
dicated by the housing situation and neighborhood ratings, low 
socio-economic status was found in a higher proportion of the 
eleven-year-olds. 
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As to the relative proportion of seriously maladjusted youngsters 
among the eleven-year-olds, the evidence is indirect. A previous 
study (9) in this series had revealed that among repeaters giving 
evidence of personality distortion, strongly concentrated among 
boys who were in trouble over and over again even though not 
members of gangs, certain items were linked with recidivism and 
reliably predictive of it. Among these were the boys’ expressed 
attitude toward parents, the presence of family tension as evidenced 
by broken homes, and the failure to supply information on such 
practices as the giving of money by parents. The last-mentioned 
was striking. It could be interpreted either as an emotional block- 
ing, as an evidence of shame, or a desire to be close-mouthed about 
family affairs. In any event, in the present study, all these indices 
previously found associated with maladjustment were less frequent 
in the eleven-year-old segment of the sample. Although the evi- 
dence leans in the direction predicted by our hypotheses, it ob- 
viously requires verification by studies utilizing more direct and 
sensitive measures of emotional stability. 

One result of the present study was not fully anticipated. This 
had to do with the greater conventionality in the eleven-year-olds’ 
expressed attitudes toward parents and teachers. If this group 
had hostile feelings, as we would expect they should, these were 
but one side of an ambivalence. Perhaps the fact that they were 
still more dependent, as indicated by their having to rely upon 
parents for spending money, made them more apt to feel uneasy 
at the prospect of open defiance. At this point we can only specu- 
late. Several possibilities suggest themselves. It may be that fewer 
of these youngsters are emotionally disturbed in any marked de- 
gree and that their conventionality is merely a sign of that fact. 
It is equally possible that the conventionalityis merely a residue 
of childish attitudes which they will out-grow, and, as years bring 
them added independence, rebellion will become more whole- 
hearted. Here, again, there is need for further investigation in 
which such sensitive devices as various projective tests can give 
us a clearer picture of what lies below the surface. 


SUMMARY 


The police files on all boys interviewed on complaint by Detroit 
Youth Bureau officers in 1949 were studied, and the 334 eleven- 
year-olds compared with the 3,787 who had passed their twelfth 
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birthdays. The eleven-year-olds as a group were found to come in 
greater proportion from poorer socio-economic levels, to be more 
dependent upon the parents, to express more conventional at- 
titudes toward adult-managed institutions, and to have better 
social relationships with other youngsters. 
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THE RELATIVE INTELLIGIBILITY OF 
MALE AND FEMALE TALKERS! 


B. SILVERSTEIN, R. C. BILGER, T. D. HANLEY, anv M. D. STEER 


Purdue University 


Very early in the history of experimental investigation in speech, 
researchers recognized that males and females constitute different 
experimental samples. The two sexes cannot be assumed to be 
drawn from the same population when some of the voice variables 
(notably pitch) are considered. Accordingly, most of the research 
articles to be found in publications in the speech field report re- 
sults of investigations employing all male experimental samples. 
A few experiments employing female subjects have been reported. 
However, experiments involving both male and female subjects 
are almost unprecedented. 

The paucity of speech research involving female and male-and- 
female groups is particularly to be noted in the series of intelli- 
gibility studies conducted under military auspices during and after 
the Second World War. Many important results of such investi- 
gations, applicable only to a male population, have been published. 
Among the findings which have been demonstrated to have statis- 
tical significance are the following: 

1) Speech signals louder than conversational level are necessary 
to intelligible communication in noise (1, 2). 

2) Instruction and practice in increased syllable duration will 
result in improved intelligibility (7). 

3) Practice involving ‘Read Back’ of transmitted messages 
through difficult transmission conditions will result in improved 
intelligibility (4). 

4) Taking intelligibility tests will improve intelligibility (6). 

5) Two hours of instruction in loudness and clearness will re- 
sult in a substantial improvement in speech intelligibility (7). 

6) Intelligibility tests given under laboratory conditions are 
valid for distinguishing the superior from the inferior speakers, 
with respect to speech intelligibility in high level noise (4). 


1 This research was carried out under contract with the Office of Naval 
Research, Special Devices Center, Human Engineering Division, as Con- 
tract N6ori-104, T. O. II, Project NR-782-003, of which this is Technical 
Report Number SDC 104-2-29. 
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However, as previously stated, in all of the investigations from 
which these conclusions were drawn, the samples employed were 
limited to males of military age. Since the population concerned 
with voice communication in high level noise was almost exclusively 
male during the first period of these investigations, it was practi- 
cally imperative that the samples be exclusively male. Today, 
however, an increasing tendency exists for the assignment of women 
to responsible communication positions, within the armed services 
and in volunteer civil defense agencies. Illustrative of this increased 
participation by women in the defense effort is the number of 
women currently engaged in control tower duties at civilian and 
military airfields. 

The increased dependence upon women for vital military voice 
communications makes necessary a reéxamination of the known 
facts about intelligibility to determine their applicability to female 
talkers. It is important, therefore, to ascertain whether or not voice 
communication normative data drawn from investigations invol- 
ving all male subjects may be applied to the female population 
which may be called upon to do vital communication work. Con- 
sidered to be particularly pressing is the question of how well fe- 
male talkers perform when in the presence of high level noise. 

The purpose of this investigation was to determine the relative 
intelligibility of male and female talkers over standard military 
communication equipment in the presence of high level noise. More 
specifically, it was of interest to determine if there were any differ- 
ences in speaking ability manifested between untrained groups of 
male and female talkers; if there were any differences in the effects 
of training for improved intelligibility upon groups of male and fe- 
male talkers; and if there were any differences in intelligibility 
scores of male or female talkers attributable to the sex of the lis- 
tening panel. 


SUBJECTS 


Initially the subjects for this investigation were ninety male 
and ninety female students enrolled in sections of an elementary 
course in public speaking at Purdue University. Of these one 
hundred and eighty subjects, forty-five male and forty-five female 
subjects were designated as the experimental group to be given 
two hours of training in voice communication in addition to being 
tested. The remaining subjects were designated as the control 
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group and were tested and retested without intervening training. 
Further division of both the contro] and experimental groups was 
effected so that there were three subdivisions within each group: 
all male panels, all female panels, and panels composed of equal 
numbers of men and women. 

Of the initial group of one hundred and eighty subjects, forty-two 
were dropped from the investigation because of failure to attend 
training or retesting sessions, or because their scores were selected 
at random as scores to be discarded in order to equate the sizes of 
the subgroups. (In order to employ a standard form of orthogonal 
analysis it was necessary to maintain equal-sized subgroups in the 
experimental and control samples.) 


PROCEDURE 


Instrumentation.—Three Portable Interphone Trainers (Navy 
Device 8-1) were utilized in this investigation. One Device 8-I was 
used as a noise generator and the noise from this source was fed 
through a junction box into the phono input of two other Device 
8-I’s which were used as Interphone Trainers (Fig. 1). The ampli- 
fied noise from each Device 8-I being used as an interphone trainer 
was then coupled to ten sets of headphones, Model ANB-H-1. The 
noise level output was adjusted so that the ‘full noise’ condition 
produced a noise level of 106.5 db, re 10-'* watts/cm?, in the head- 
phones, as measured by an ADC Artificial Ear. This noise level, 
used in the intelligibility testing sessions, was set initially by three 
judges on a subjective criterion. It was judged to be a noise level 
which would allow untrained speakers to achieve approximately 
fifty per cent intelligibility on the VCL, 24-Work Multiple-Choice 
Test lists. (4). 

The speech input from the carbon microphones, Model T-38C, 
was coupled to the Channel No. 1 carbon microphone input of each 
Device 8-I being used as an Interphone Trainer. The speech channel 
gain was calibrated so that an input of 0.07 volts, at 1,000 cps, 
produced an output of 0.49 volts across the headphones. 

For the ‘reduced noise’ condition, used during the training ses- 
sion, the noise level was reduced 10 db, to 95.6 db, re 10-'* 
watts/cm*?. The speech channel gain was also reduced by 10 db 
in order to maintain the same signal-to-noise ratio that was used 
in the testing situation with ‘full noise’. 

Method.—The procedure followed in this study is similar to the 
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general pattern followed in several other studies of talker intelli- 
gibility in the presence of high level noise previously reported from 
this laboratory. (6, 7, 8) The procedure in this investigation may 
be outlined as follows: 

A) A pre-training test of intelligibility given to all subjects; 

B) A training period for the experimental group only; and 

C) A post-training test of intelligibility for all subjects. 
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Fic. 1. Block diagram of instrumentation used to test speaker intelligibility 


Pre-training test.—The subjects came to the testing room directly 
from their classrooms. They had been organized into specific test 
panels of from seven to ten members previous to their introduction 
into the testing situation. Each panel member was assigned a seat 
from among ten straightback, tablet armchairs arranged in a 
straight line. The seats were partially enclosed by booths of celotex 
sheeting mounted between the chairs. Generally, only one panel 
was tested at a time, but it was possible to test two panels simul- 
taneously by using both of the interphone trainers. 

After being seated, the subjects were given a brief period of 
orientation to the task and the equipment. The panel members 
were told, also, that they were participating in an investigation 
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being carried out by Purdue University for the Office of Naval 
Research; that the investigation was concerned with voice com- 
munication in the presence of high level noise, and that they would 
serve both as speakers and listeners during the investigation. Fol- 
lowing instruction in the proper use of the microphone and head- 
phones, the subjects were instructed to put on their headsets and 
they were given specific instructions over the interphone trainer 
for the VCL, 24-Work Multiple-Choice Intelligibility Test. (4) 
After the instructions had been given, the ‘reduced noise’ condi- 
tion was transmitted to the headphones and the experimenter 
read a sample intelligibility test. Following this, the ‘full noise’ 
condition was transmitted to the headphones, and Form A of the 
24-Word Multiple-Choice Intelligibility Test was administered. 
The subjects were dismissed immediately after testing was com- 
pleted. 

Training.—Three to four weeks after the administration of the 
pre-training test, all of the experimental subjects were called back 
with their original panel mates and given a one-hour training ses- 
sion. The first part of the training session consisted of a lecture 
pointing up the importance of loudness and clear pronunciation in 
radio-telephone communication.” 

The training lecture emphasized the use of a loudness level ‘just 
short of shouting’ for maximum intelligibility. It was pointed out 
that the speaker might use the speech signal in his own headphones 
(sidetone) to determine whether he was using sufficient loudness 
for the particular noise barrier. In addition to these instructions, 
the correct use of the microphone was emphasized again. Instruc- 
tions for increased clearness included the suggestionthat all of the 
sounds of the words should be uttered precisely and that all of the 
syllables should be given equal loudness and equal duration. 

Following this brief lecture period the subjects were given an 
opportunity to practice the techniques which had been discussed. 
Each subject was given a printed military-type message which he 
was directed to read to another subject in his circuit.? The speci- 
fied listener called for repeats until he was certain of the message. 


2 A detailed description of this lecture can be found in SDC Technical 
Report, No. 104-2-4, Purdue Voice Science Laboratory, Lafayette, Indiana, 
1947. | 
3 This departure from the procedure of previous investigations conducted 
at this laboratory (6, 7, 8) is to be noted. Hitherto the training messages 
utilized consisted of non-military communications. 
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Then he repeated it for the originator, who corrected any errors 
in message content. After the message recipient read back the mes- 
sage correctly, another pair of subjects was given an opportunity 
to engage in similar practice. During this training period each 
subject acted as the originator and as the recipient of a message 
to be ‘read back’ correctly. This practice period was interspersed 
with further instructions and specific criticisms in which was 
stressed the importance of the techniques being employed for in- 
creased intelligibility. 

Original messages and repeat-backs were spoken over the same 
interphone circuits used in testing. The noise level during the 
training program was set on ‘reduced noise’ condition. 

Post-training test—Three to four weeks after the experimental 
group training sessions, post-training tests were administered to 
both the experimental and control subjects. All subjects were tested 
with their original panel-mates and all subjects occupied the same 
seats and used the same equipment that they had used in previous 
sessions. Form B of the VCL, 24-Word Multiple-Choice Intelli- 
gibility Test was administered to all subjects as the post-training 
test. All calibrations were identical to those used in the pre-training 
test. 

Statistical analysis.—As previously indicated, the purpose of this 
study was to investigate certain aspects of the performance of male 
and female talkers, specifically their training. For statistical analy- 
sis, per cent intelligibility scores for individuals and subgroups 
were computed. The mean number of items correctly marked by a 
listening panel for a given talker, divided by the number of words 
spoken by the talker (twenty-four) constituted the intelligibility 
score for the talker. 

The data collected were treated by three separate analyses. First, 
the data from the pre-training test were analyzed by an analysis 
of variance technique to determine if there were any statistically 
significant differences, with respect to speaker intelligibility, be- 
tween the sex subgroups at the outset of the experimentation. The 
data from the post-training test were treated by the same analysis 
of variance design to determine if any differences were present in 
the post-training test results. Finally, so that the comparison would 
be based on the ‘best-fitting’ regression line rather than the as- 
sumption of a perfect regression, an analysis of covariance was used 
to compare pre-training to post-training scores in order to deter- 
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mine if any significant differences were present in the intelligibility 
gains made by the sex subgroups. 

Whenever significant differences were found, ‘t’ tests for the 
significance of difference between means were used to isolate the 
differences. 


TaBLE I.—SuBGRoup MEANS AND STANDARD DEVIATIONS FOR TEST AND 
RETEST 





Experimental Group (Training) 


























Subgroup N Test SD Retest | SD 
rd k a cease 23 57.1 10.6 69.2 10.2 
ee a ae ae 23 §2.3 12.6 69.0 8.3 
BS ae Say gs 23 | 53.2 9.9 68.2 | 9.0 

Total.........| 69 | 54.2 11.1 68.5 9.2 








Control Group (No-Training) 


























Na Sn Ret on 23 64.9 11.0 62.8 12.4 

NS oO es a 23 48.9 9.0 53.5 8.9 

ea ot a 23 53.0 | 9.5 59.1 9.8 

Total.........| 69 55.6 9.9 58.5 10.5 
RESULTS 


The means and standard deviations of the intelligibility scores 
for all subgroups, experimental and control, on both the pre-train- 
ing and post-training intelligibility tests are presented in Table I. 

Relative Intelligibility of Untrained Male and Female Speakers.— 
The results of the analysis of variance utilized to test the signif- 
icance of differences in the pre-test data are presented in Table II. 
The ‘F-ratio’ for between sex subgroups is significant at the one 
per cent level; but, before any conclusions may be drawn, the 
significant interaction, experimental-control X sex subgroups, must 
be noted, investigated and interpreted. 

Since the interaction experimental-control X sex subgroups is 
significant at the 2.5 per cent level, it is permissible to conduct 
separate analyses of variance on each of the main groups in order 
to investigate the source of the interaction. The results of these 
analyses (see Table III) indicate that the reason for the significant 
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interaction was that there were differences, significant beyond the 
one per cent level, between the sex subgroups within the control 


group, while there were no significant differences within the experi- 


TABLE IJ.—ANALYsIS OF VARIANCE TO Test INITIAL DIFFERENCES IN 





INTELLIGIBILITY 
Source of Variation all | ol Probability 


Squares 
— 
Between sex subgroups | 2,724.26 2 1,362.13 11.80 P < .001 


Between experimental- 67.91) 1) 67.91;<1 — 
control | 
Interaction: experimen- 759.52} 2) 379.76) 3.92) .01 < P < .025 
tal-control X sex sub- | 
groups 
Within groups (error) 15,239.00, 132 | 115.45 





























TaBLE III.—ANALyYsiIs OF VARIANCE TO Test INITIAL INTELLIGIBILITY 
DIFFERENCES BETWEEN SEX SuBGROUPS WITHIN THE EXPERIMENTAL 
AND CoNnTROL GROUPS 





| | | ‘ry | ian 
Group Source of variation | = | df | pend ane Probability 


305.31 2| 152.46 1.29, 0.25 < P < 0.50 
| 








Experi- | Between sex 
mental) subgroups 

| Within sex sub- (7,784.19 66 | 117.94 | 

| | 


| 
| 
| 








| groups | | 
| | | 
Total 8,089.50 68 | 
| | 
Control | Between sex 3,178.49 2 1,589.2415.57; P < 0.001 
| subgroups | | 
| Within sex sub- 6,734.78 66 | 102.04 
| groups | | | 
| | | 
| Total 9,913.27, 68 | 








mental group. However, the magnitude of the ‘F-ratio’ in the case 
of the control group is so great that it may be reasoned that the 
significant interaction is the result of chance allocation of subjects 
to experimental or control group. 

In summary, the results of analysis for sex differences in intelli- 
gibility prior to training reveal significant differences favoring the 
male sex in the control group. That a similar result was not found 
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in the experimental group is believed to be a chance effect related 
to assignment of subjects to groups. 
Effect of mixed listening panels upon speaker intelligibility, pre- 


TABLE IV.—ANALYSIS OF VARIANCE TO TEST FOR DIFFERENCES IN 
Post-TRAINING Test ScoRES 














Mean | 


‘Ff. 


Probability 


Source of variation as df | squares | ratio’ | 
—|—|——- | —— | 

Between sex subgroups... 587.48} 2 > 293.74, 2.87, 0.05 < P < .10 
Between groups (experi- 

mental-control)....... 3,426.04, 1) 3,426.04 33.44P < 0.001 
Interaction: experimen- 

tal-control X sex sub- 

is ite hcakeeekoed 428.19, 2) 214.10) 2.09.10 < P < .25 
Within groups (error)...| 13,522.39 132 102.44 

a 137 

















TABLE V.—ANALYsSs:S OF COVARIANCE TO TEST DIFFERENCES IN 
IMPROVEMENT BETWEEN INITIAL AND FINAL TEsT 











Mean | 
Source of variation pees df i a anes Probability 
scores) 
Between sex subgroups. . 123.10) 2 68.55 _— — 
Between experimental- 
control..... 3,762.87; 1) 3,762.87) 41.05 P < .001 
Interaction: experimen- 
tal-control X sex sub- 
i ccvekeseneoe aes 154.81; 2 77.40 — _ 
Within cells (error). ....| 12,008.64 131) 91.67 

















training test—In an attempt to determine the effect a mixed lis- 
tening panel could have upon speaker intelligibility scores, ‘t’ tests 
for the significance of differences between means were calculated 


as follows: | 
(a) All male subgroups vs. males in mixed subgroups, and 
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(b) All female subgroups vs. females in mixed subgroups. 
Neither of the ‘t-ratios’ approached significance. 

Sex subgroups speaker intelligibility differences in post-training 
test—The results of the analysis of variance used to test for the 
significance of differences on the post-training test (see Table IV) 
indicate that there were no significant differences between sex 
subgroups on the post-training test. The highly significant differ- 
ence between experimental and control groups is the result which, 
on the basis of previous research (5), was expected. 

Sex subgroup speaker intelligibility gains—Table V presents the 
results of the analysis of covariance used to determine if any sig- 
nificant differences existed in the relative degree of improvement 
made by the sex subgroups. The failure of the ‘F-ratio’ for between 
sex subgroups to reach a significant level indicates that, when initial 
differences in speaker intelligibility are taken into account, males 
and females benefit equally from training (whether training con- 
sists of the one-hour lesson or merely of a retest). 


CONCLUSIONS 


Within the limitations of this investigation, the following con- 
clusions seem justified: 

1) The existence of sex differences, with respect to the ability 
of untrained talkers to speak intelligibly in the presence of high 
level noise, is demonstrated by this investigation. Differences favor 
the male sex. 

2) Wherever these sex differences are found, training, if only 
the training incidental to undergoing a retest of speaker intelli- 
gibility, serves to eliminate these differences. 

3) A one-hour period of training for improved intelligibility 
results in a significantly greater gain in intelligibility than that 
made by a control group on test-retest. This conclusion is applicable 
to all male, all female, and mixed groups. 

4) Taking a test for speaker intelligibility in the presence of 
high level noise results in statistically significant gains in intelli- 
gibility on subsequent tests. However, when initial scores are 
high, the trend is not consistent and may even be reversed. 

5) The sex of the auditor seems to have little or no effect upon 
the intelligibility scores of male and female speakers. 

6) Male and female speakers benefit equally from intelligibility 
training. 
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INTER-GRADE COMPARISONS OF WORD 
FREQUENCIES IN CHILDREN’S WRITING 


GERTRUDE HILDRETH 


Brooklyn College 


A practical alphabetical list indicating the relative frequency 
with which words are used by children in writing is a requisite 
for elementary school spelling instruction. The basic assumption 
underlying the construction of the list devised by Hildreth and 
Salisbury’ and of the revision prepared by Parke? is that the words 
commonly used in children’s everyday writing should be taught 
ahead of those less commonly used. The total frequencies for all 
grades combined as given in the Rinsland Vocabulary® were used in 
the construction of the Hildreth-Salisbury list as well as the New 
York City list when preliminary correlation studies showed the 
high degree of correspondence in frequency rank from grade to 
grade for the commoner words in the total Rinsland list. 

It is important to determine the degree of relationship in fre- 
quency rank for words of the separate grade lists of the Rinsland 
Vocabulary and total frequency for all grades combined, because 
this information would aid in determining whether the total fre- 
quencies for particular words are as valid for word selection and 
sequential gradation of spelling words as the separate grade lists. 
If so, the separate sub-lists by grades can be disregarded, and the 
total frequency column in Rinsland furnishes valid information 
for preparing word lists arranged in frequency levels; separate 
grade lists for the intermediate and upper elementary grades then 
become unnecessary. 

Another interesting question is the stability in frequency rank 
from grade to grade of words more commonly used by children in 
writing in comparison with those less commonly used. For example, 
does such a common word as ‘chair’ have a more consistent rank 





1 Gertrude Hildreth. ‘‘Spelling as a language tool.’’ Elementary School 
Journal, xivi11, (September, 1947) 33-40. 

2? Margaret Parke. A Manual to Guide Experimentation With Spelling 
Lists A, B, and C. New York City Board of Education, 1951. 

*Rinsland, H. D. A Basic Vocabulary of Elementary School Children. 
New York: The Macmillan Co., 1945. 
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in each separate grade than a less commonly used word such as 
‘chapter’. For younger children in Grades III and IV teachers are 
chiefly concerned about a short list of high frequency words. 

Method.—The problem was attacked by making a random sam- 
pling of one hundred words from the Hildreth-Salisbury List, 
Levels I through VIII, Level I consisting of the most commonly 
used words according to total frequency such as ‘my’, ‘very’, ‘our’; 
Levels VI, VII and VIII consisting of relatively infrequently used 
words such as ‘zone’, ‘machinery’, ‘quart’, and then comparing 
the deviations in rank for these one hundred words in the separate 
Rinsland grade lists for Grades III, IV, VI and VIII. Omitted 
from the one hundred words were abbreviations, two-word com- 
binations, and proper names. 

Words in the Rinsland grade levels I and II were omitted from 
these comparisons because these grades obviously present a differ- 
ent problem in spelling word selection than the higher years. 
Grades V and VII were omitted because it seemed unnecessary 
to do the computations for each one of the separate upper ele- 
mentary grades. 

In order to determine the intergrade relationships for the com- 
moner words the same comparisons in terms of deviation in fre- 
quency rank were made for the forty commonest words in the 
sampling of one hundred according to Rinsland total frequency. 
These forty words correspond roughly to the commonest 1800- 
2000 words used by elementary school children. The entire hundred 
words correspond approximately to the commonest 4500-4800 
words used by children in their writing. Intergrade deviations for 
the forty commonest words were not computed because the trends 
are evident from the comparisons with the respective grade fre- 
quencies and total frequency for the hundred-word list. 

Frequency distributions were made for these deviations in rank 
without respect to sign in step intervals of 1 and the medians for 
all these distributions were computed. Table I shows the distribu- 
tions of the deviations, the medians for each set of deviations, and 
the range in deviation for each set of comparisons. In Table I, 
T stands for Total Frequency. The grade designations at the head 
of each column, e.g., 3-4, 3-6, etc. mean Grades III and IV, III 
and VI, and so on. The medians were computed on ungrouped 


distributions of deviations. 
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The following deviations in frequency rank for each of the hun- 
dred words were computed: Grades III and IV, III and VI, III 
and VIII, III and total; IV and VI, IV and VIII, IV and total; 
VI and VIII, VI and total, VIII and total. For the forty high 


TaBLE I. DistTRIBUTIONS OF DEVIATIONS IN RANK AT VARIOUS GRADE 
LEVELS FOR SELECTIONS OF WORDS IN THE HILDRETH-SALISBURY LIsT 


(medians computed on ungrouped data) 




















Data for 100 words in the total list Mo wot gah —. 
Deviation 
Gr. | Gr. | Gr. | Gr. | Gr. | Gr. | Gr. | Gr. | Gr. | Gr. | Gr. | Gr. | Gr. | Gr. 
3-4 | 3-6 | 3-8 | 3-T | 4-6 | 4-8 | 4-T | 68 | 6-T | 8-T | 3-T | 4-T | 6-T | 8-T 
50+ 1 1 
48 
46 
44 1 1 
42 1 1 
40 2 l 
38 1 1 
36 1 l 1 2 
34 4 1 
32 2 1 2 2 
30 1 1 2 1 ea 1 1 
28 3 21 2 1 1 1 
26 3 1 2 2 1 1 
24 1 1 212] 2 1 1 
22 1; 6; 3] 4] 4] 3] 3] 2] 1 1 1 
20 1 24 .6;-23 34:2 212] 2 
18 3} 2} 4] 2] 3] 4 4} 2] 3 1 
16 5 | 3 1 3); 5] 1 3/9] 2] 6] 1 1 
14 3/8} 9] 9] 7] 7] 2] 4] 37 3 1 1 1 
12 8; 4/1 7] 5] 6] 6] 8] 7] 3] 3 2; 1 
10 Ts T'S Ses Si St OF Ti Fh 3 Be 3.8 
8 71;10/13] 5) 3] 6] 7;12;11] 8] 1] 2] 5] 3 
6 12} 9} 9/14]10/]10)} 6]/10]11)15] 7 1 5} 6 
4 12} 11 4}/16/13]}11])14] 8] 18) 11) 11 8} 8] 6 
2 16/13} 6/17/]18}16)19}13)19}20; 7;10|] 7] 9 
0 21 | 14 | 12] 15/11 | 14] 30] 15 | 20; 17) 11)16)10| 7 
N 100 |100 {100 |100 |100 {100 |100 {100 {100 |100 | 40 | 40 | 40 | 40 
Range /|0-37|\0—- |0-50/0-34/0-43/0—-42/0—23 0-37 |0-22/0-30/0-16/0—13|0—22/0-28 
52.5 
Median |6.17/9.14/12.0) 6.2) 7.3/7.83/4.12) 8.5|5.14| 6.3) 4.4) 3.0) 4.6) 5.2 
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frequency words the following deviations in frequency rank were 
computed: III and total; IV and total, VI and total, VIII and 
total. 

Results—Table I shows the median deviations in rank for the 
frequency order of word usage in the comparisons made among 
the respective grades and total frequency to be as follows: 

Grades III and IV, 6.17; III and VI, 9.14; III and VIII, 12.0, 
III and total, 6.2; Grades IV and VI, 7.3, IV and VIII, 7.8, IV 
and total, 4.12; Grades VI and VIII, 8.5, VI and total, 5.14, VIII 
and total, 6.3. These results indicate that the different words are 
used with about the same relative frequency in the upper elemen- 
tary grades. There is so little difference in the deviations for word 
frequency in Grades IV and VI, IV and VIII, and VI and VIII, 
(7.3, 7.8, and 8.5) and these deviations are so small as to indicate 
that separate frequency lists for these grades need not be con- 
structed. 

The Grade IV frequency ranks are closest to the total, possibly 
because Grade IV represents a halfway point in the range of grades 
from I to VIII and would be less influenced by the brevity of the 
Grade III list or the wide range of the Grade VIII words. 

Inspection of the data for Grade III shows that although the 
deviations in comparison with the separate grades is lower for the 
adjacent Grade IV than for Grades VI and VIII, nevertheless the 
comparison between Grade III and total frequency shows virtually 
the same deviation as that between Grade VIII and total, both of 
these deviations being slightly higher than for Grades IV and 
total, and Grade VI and total. These findings suggest that the 
Grade III list is somewhat more restricted, less representative than 
the Grade IV and Grade VI lists, as would be expected from a 
cursory examination of the writings of third-graders. By Grade IV 
children are ‘hitting their stride’ in writing, and using almost as 
wide a range of the most common words with about the same 
frequency as they will use them later on in the higher grades. 

Our interest in Grade III words for purposes of teaching spelling 
is confined chiefly to words that are commonest among 2000 in 
English usage, rather than the entire range of which the hundred 
words are representative. 

The comparisons for the Grade VIII frequencies seem a bit out 
of line as do the comparisons for Grade III, but for a different 
reason. The Grade VIII list appears to be more influenced than 
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the lists for Grades IV and VI by words used in formal school 
theme writing and formal correspondence. 

The smaller deviations for each of the separate grade compari- 
sons with total frequency are due in part to intercorrelation, since 
the total frequency contains the frequency tabulations for the 
particular grade in question as well as for all others; but the lower 
deviations are also due in part to the greater stability of the total 
frequency list which combines frequency counts for eight separate 
grades and hence is many times larger than any of the separate 
grade frequencies. 

Some words show wide deviations in frequency rank from grade 
to grade, some relatively small; for example, the average deviation 
in all intergrade comparisons for the word ‘got’ is 1; for ‘dear’, 4; 
for ‘handkerchief’, 16; for ‘April’, 29. 

For the forty words of highest total frequency the median devi- 
ations are as follows: Grade III and total, 4.4; IV and total, 3.0; 
VI and total, 4.6; VIII and total, 5.2. 

These deviations compared with those for the total list of one 
hundred words prove what one might expect—that the frequency 
rank order for the commonest words is more consistent grade for 
grade than the order for less common words. The egocentric ‘my’ 
and ‘our’ come out on top no matter what the grade level of chil- 
dren using the words. 

The less frequently a word is used by elementary school children 
according to total frequency count, the less consistent the rank 
from grade to grade, in general. This conclusion was to be expected, 
because uncommon words are in a sense specialized, e.g., ‘ma- 
chinery’, ‘quart’, ‘office’, ‘April’. Who can predict with certainty 
just when and where these words will be used in writing by anyone? 

The wide deviations certain words show (‘chance’ has a deviation 
of 50 for Grades IIT and VIII; ‘April’, a deviation of 52.5 for 
Grades III and VI; ‘office’, a deviation of 44 for Grades III and 
VI) are not entirely due to chance. These are obviously less fre- 
quently used words in general and words of a more highly spe- 
cialized character. Inspection of the original tally sheets proves 
that the words of highest frequency among these forty commonest 
words show more consistency in rank than the total forty com- 
moner words in the selection of one hundred. 

For the forty commonest words, Grade III now appears to be 
more in line with the other grades. (Note the deviations for Grade 
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III and total for the hundred words and Grade III and total for 
the forty high frequency words.) For the forty words, the Grade 
III median deviation compared with the total frequency is slightly 
under that for Grades VI and VII, but still higher than for Grade 
IV. This finding is more significant than the results for the hundred 
word comparisons because these forty words represent the com- 
monest 1800-2000 words in frequency of use. This is the maximum 
vocabulary with which any spelling work in the primary grades 
need be concerned. 

In evaluating all these comparisons some allowance must be 
made for statistical unreliability of the differences between medians 
due to errors of sampling. 

These results tend to indicate that the separate grade frequencies 
as given in Rinsland probably have no more validity for ‘grading’ 
spelling words in order of frequency of use in writing than the 
Rinsland total frequency list for all grades. There can be little 
question of this conclusion for the 2000 or so commonest words 
used in the English language. In the long run, the total frequency 
list representing the largest number of cases is the most stable, 
valid, and reliable for the sequential gradation of spelling words. 
The highly specialized nature of word usage above the 2000 or the 
2500 word limit suggests the difficulty of determining valid grade 
placement of these words. 

Common sense suggests that word selection for spelling in the 
primary grades does present a somewhat different problem than 
word usage in Grades IV and above. No one would advocate 
giving the younger children words for study arbitrarily selected 
from a frequency rank list no matter how consistently the com- 
monest words are used at all grade levels. 

Here is a promising field for further research. Additional studies 
should be made to determine the inter-grade relationships in fre- 
quency rank for a larger sampling of words in each of the separate 
grade levels of the Rinsland List and throughout the entire range 
of the words listed there. More extensive studies should be made 
of the relative correspondence in frequency rank of common and 
uncommon words, of special terms, proper names, and so on. It 
would be interesting also to discover the relationship that exists 
when the word usage of children in the elementary grades is com- 
pared with the word. usage of high school students and adults. 














COMPARISON OF PSYCHOLOGY INSTRUC- 
TORS AND NATIONAL NORMS ON THE 
PURDUE RATING SCALE 


A. W. BENDIG 


University of Pittsburgh 


As was pointed out in a previous report (2), student ratings of 
instructors in introductory psychology can serve two needs: (a) 
as part of a multiple criterion of teaching competence when com- 
bined with other measures of the instructors’ teaching behavior, 
and (b) as a source of information for the instructor in diagnosing 
his teaching strengths and weaknesses as seen by his students. 
Before student ratings can be used in an evaluative procedure 
much more needs to be known about the influence of characteristics 
of the students (sex, academic achievement, interests, etc.) upon 
their ratings and about the relationships of such ratings to other 
measures of teaching competence (supervisor and peer evaluations, 
objective test scores, speech patterns, etc.). However, student scales 
can more immediately be used to aid the instructor in discovering 
and modifying his more obvious deficiencies. Resistance to the 
use of student rating scales in evaluating the competence of college 
teachers is both widespread and, in view of the obvious inade- 
quacies and unproven validity of most scales, justifiable at this 
time. Their usefulness to the instructor as a help in self-diagnosis, 
however, is more accepted and rests upon a sounder basis. 

One necessity in using student scales as diagnostic tools is nor- 
mative data. For one such set of scales, the Purdue Rating Scale 
for Instruction (PRSI), percentile norms based upon the mean 
ratings of two hundred and five college instructors in many different 
subject matter areas at Purdue University have been provided by 
Remmers and Baker (9). In previous semesters these norms have 
been utilized in reporting student ratings to psychology instructors 
at the University of Pittsburgh who requested a student evaluation 
of their teaching. However, it soon became evident that these norms 
could not be meaningfully applied to the ratings of our instructors. 
For example, the mean rating of introductory psychology instruc- 
tors on PRSI scale 3 (Fairness in Grading) fell at the 90th percentile 
of the Purdue norms and at the 13th percentile on scale 10 (Stimu- 
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lating Intellectual Curiosity). These results suggested that either 
our sample of departmentally homogeneous instructors was not 
comparable to the heterogeneous group of instructors reported upon 
by Remmers and Baker, or that University of Pittsburgh students 
evaluate their instructors somewhat differently than do students 
at Purdue University. 

The first of the above suggestions appears to be the more tenable. 
Allof our instructors are employed by the same department and have 
similar professional interests and training. In contrast, the Purdue 
norms are based upon ratings of instructors from many different 
departments and disciplines. In other reports it has been suggested 
that inter-disciplinary differences in the subject matter taught may 
differentially affect student ratings of instructors (6) and even 
intra-department course content variations may be important 
(4). In addition to the subject matter homogeneity, most of our 
instructors tend to be younger than the average age of university 
instructors and to have had less teaching experience. Data reported 
by Goodhartz (7) suggest that younger instructors tend to be rated 
more favorably by students than do older staff members. Descrip- 
tions of the professional characteristics of our instructors can be 
found in previous articles (2, 4). As to student differences between 
Purdue and Pittsburgh, population data on students enrolling in 
introductory psychology at the University of Pittsburgh led to 
the conclusions that “the day-time students fit the picture of the 
average college student at most institutions” (2, p. 169). 

The obvious solution to our norms problem was to formulate 
our own norms based upon the data available upon our instructors. 
However, the non-comparability of our data raised a question as to 
the reliability of the PRSI scales when used with our sample. 
Remmers and Baker (9, p. 4) report high reliability for the scales 
based upon one hundred and fourteen instructors each rated by 
varying numbers of students. Before our proposed norms could be 
used, the reliability of the mean ratings based upon our sample had 


to be investigated. 


PROCEDURE 


During the Spring and Fall semesters of 1951 eleven instructors 
taught introductory psychology to daytime undergraduate stu- 
dents. Of the eleven, ten were male. They varied in academic rank 
from the lecturer to associate professor levels. All had had a min- 
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imum of one year of college teaching before instructing in 
this course. The PRSI ratings were collected two or three class 
periods before the final examination by members of the department 
other than those teaching in this course. Quantification of the 
ratings followed the procedure described in the PRSI Manual (9, p. 
12). Only data from the first ten scales of the PRSI were used since 
the primary need for normative data was on the scales referring 
to instructor characteristics. The remaining sixteen scales of the 
PRSI refer to course characteristics. The number of students rat- 
ing each of the eleven instructors varied from seventeen to ninety- 
eight. 

The reliability of the average ratings from each scale was com- 
puted by two methods: (a) the intraclass reliability estimate de- 
scribed by Ebel (6), and (b) the generalized reliability formula 
developed by Horst (8). When unequal numbers of students rate 
each instructor these two estimates may differ somewhat because 
of the different weighting applied to the average rating of each 
instructor. The average rating of a given instructor contributes to 
the intraclass estimate in proportion to the number of students 
rating that particular instructor, whereas the generalized formula 
weights the average rating of each instructor equally in determin- 
ing scale reliability, regardless of the differing number of raters 
from instructor to instructor. 

The obtained results in terms of the mean, median, standard 
deviation, and the two reliability estimates of each scale can be 
found in Table 1. In addition, comparable data from the Purdue 
norm group are also given. 

It is to be noted in Table 1 that PRSI scale 5 (Presentation of 
Subject Matter) was the most discriminating scale with our homo- 
geneous group of instructors and scale 3 (Fairness in Grading) 
appears the poorest. This latter result is most probably attributable 
to the well-structured departmental grading policy which was fol- 
lowed by all instructors and permitted little individual instructor 
influence over the grade achieved by each student (1, 5). In addi- 
tion, the mean rating for scale 3 was the highest of the ten scales, 
indicating the students approved of the objective grading system 
used and recognized that there was little difference between in- 
structors as to this characteristic. Scales 1, 5, 7, 9, and 10 appear 
to be quite reliable, while the reliability estimates of the remaining 
five scales suggest that less reliance should be placed upon their 
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evaluation of our introductory psychology instructors. Previous 
data have indicated a generally negative skew to the distribution 
of ratings on these scales (5,9, p. 10) and a comparison of the means 
and medians in Table 1 shows this to be true, in varying degrees, 
for eight of the ten scales. Revising the wording of the scale cate- 
gories to reduce skewness in the five scale showing unsatisfactory 
reliabilities would probably improve their discrimination as was 
found in modifying similar scales (3). 


TABLE 1.—NorMATIVE Data ON MEAN RarTINGs OF ELEVEN INTRODUCTORY 
PsycHo.LoGy INSTRUCTORS ON THE PURDUE RATING SCALE FOR 
INSTRUCTION (AVERAGE NUMER OF RATERS = 42.8) AND 
CoMPARISON WITH PuRDUE NORMS 





























Psychology Norms Purdue Norms 
Purdue Scale —s Reliability 
Mean of |Median of | neviation of |~ , Median Reliability: 
Means Means + semen Intra- ‘Genera-| °f Means | Generalized 
class | lized 
1 83.5 86.5 7.7 0.91 | 0.91 90 0.92 
2 89.0 89.5 3.6 0.82 | 0.61 87 0.92 
3 91.9 92.3 3.9 0.58 | 0.54 86 0.86 
4 85.5 86.4 4.7 0.81 | 0.64 85 0.91 
5 66.6 66.7 13.7 0.96 | 0.93 79 0.93 
6 79.6 81.5 6.6 | 0.88 | 0.75 83 0.90 
7 77.7 | 80.0 8.6 | 0.91 | 0.90 84 0.92 
8 79.2 | 77.6 5.8 | 0.76 | 0.65 83 | 0.92 
9 87.7 89.4 7.3 | 0.92 | 0.91 92 — ~0.94 
10 67.8 67.3 8.7 | 0.92 | 0.84 | 78 |; 0.91 











Comparing the Horst reliability estimates reported by Remmers 
and Baker (9, p. 4) with the present data indicates a generally 
higher reliability of the scales as used with the Purdue sample. This 
lessened reliability is attributable to two factors: (a) the previously 
mentioned departmental control over the testing and evaluative 
procedures used by the instructors, thus reducing inter-instructor 
variability in the eyes of the students, and (b) the greater homo- 
geneity of both course and instructional content between instruc- 
tors, since the data were obtained from a limited sample of instruc- 
tors teaching a specific course in a single university department. 
In view of the tremendous influence of these factors in reducing 
instructor heterogeneity in our sample, the fact that PRSI scales 
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1, 5, 7, and 9 yielded reliabilities quite comparable to those found 
with the heterogeneous Purdue sample is a tribute to the discrimi- 
natory ability of both students and scales. 

Two suggestions are offered based upon the above results. First, 
that serious consideration be given to revising the wording of PRSI 
scale categories to achieve less skew in the distribution of ratings. 
Secondly, ratings from a fewer number of independent ‘scores’ 
for each instructor and evaluations that would be more reliable. 
A suggested line of attack here might be the factor analytic ap- 
proach reported by Smalzreid and Remmers (10). 

In conclusion, these data suggest that the Purdue norms be used 
cautiously in evaluating the mean PRSI ratings for any group of 
instructors. Variations in subject matter area and in characteristics 
of the course taught may seriously invalidate both the percentile 
norms and scale reliabilities reported in the PRSI Manual. These 
norms should be used only when an obtained sample of ratings 
can be shown to be comparable to the Purdue data. 


BIBLIOGRAPHY 


1) A. W. Bendig. ‘‘The reliability of letter grades.’’ Education & Psycho- 
logical Measurement 13, 1953, 311-321. 

2) - “The use of student rating scales in the evaluation of 
instructors in introductory psychology.’ Journal of Educational Psy- 
chology, 43, 1952, 167-175. 





3) —————— “‘A statistical report on a revision of the Miami Instructor 
Rating Sheet.’’ Journal of Educational Psychology, 43, 1952, 423-429. 
4) ——————_ “A preliminary study of the effect of academic level, sex, 


and course variables on student rating of psychology instructors.’’ Journal 
of Psychology, 43, 1952, 21-26. 

5) ‘The effect of level of achievement upon students’ instruc- 
tor and course ratings in introductory psychology.’’ Educational & Psycho- 
logical Measurement 13, 1953, 437-448. 

6) R. L. Ebel. ‘‘Estimation of the reliability of ratings.’’ Psychometrika 
16, 1951, 407-424. 

7) A.S. Goodhartz. ‘‘Student attitudes and opinions relating to teaching 
at Brooklyn College.’’ School and Society, 68, 1948, 345-349. 

8) Paul Horst. ‘‘A generalized expression for the reliability of measures.” 
Psychometrika, 14, 1949, 21-31. 

9) H. H. Remmers, and P. C. Baker. Manual of Instructions for the Pur- 
due Rating Scale for Instruction. Lafayette, Indiana: Division of Educa- 
tional Reference, Purdue University, 1952. 

10) N. T. Smalzreid and H. H. Remmers. ‘‘A factor analysis of the 
Purdue Rating Scale for Instruction.’’ Journal of Educational Psychology, 
34, 1943, 363-367. 














BOOK REVIEWS 


Higher Education in the Forty-eight States: A Report to the Wovernors’ 
Conference. Chicago: The Council of State Governments, 1952, 
pp. 317. $5.00. 


At the 1952 meeting of the Governors’ Conference the Council 
of State Governments reported on an extensive study of higher 
education in the forty-eight states. The present volume presents 
this report with five chapters of interpretive text and over one 
hundred pages of detailed tables. The text is written in a style 
suited to describe higher education in this country to governmental 
officials who are not educational experts. At the same time, it con- 
tains a wealth of interpretation and extensive statistics of very 
great value to the specialist in educational administration. 

The five chapters deal successively with the history of American 
higher education, programs in a broad sense but not including in- 
ternal curricular or administrative problems, finances from the 
point of view of expenditures and income, and, finally, organization 
in the sense of the governmental controls of public institutions 
and the place of such institutions in the state governmental struc- 
ture. 

Comparisons among states in relation to the support of higher 
education are facilitated by a technique of determining for the 
states a per mill rate that each state has in relation to the total for 
the continental United States. The states vary widely in absolute 
and ratio figures on all variables considered. The proportion of 
state support for higher education has varied over the years. Al- 
though absolute amounts have increased, the percentage of total 
income from state sources has decreased from about thirty-five 
per cent in 1918 to twenty-seven per cent in 1950. 

In the last chapter, on organization, there is an enlightening 
analysis of the part played by legislative and administrative officials 
in the control of state educational institutions, and of the legal and 
practical authority of institutional boards of trustees. The varia- 
tion here follows a pattern of differences among the states which is 
evident in almost all aspects of state government. 

At no place do the authors of this report attempt evaluative 
comparisons. Their function is to describe the status quo and to 
indicate something of the history which has preceded it. This task 
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has been done extremely well. The basic data are here available 
for educational or governmental evaluation. C. M. Louttir 
University of Illinois 


MiriAM Forster Fiepuer. Deaf Children in a Hearing World: 
Their Education and Adjustment. New York: The Ronald 
Press Co., 1952, pp. 320. $5.00. 


For two hundred years the education of deaf children has been 
one of the most highly specialized and most controversial areas in 
education. Since the time of Rudolph Pintner many psychologists 
have been interested in the intelligence, personality, and adjust- 
ment of deaf persons who are so obviously in an environment which 
they often share in a very limited fashion because of barriers of 
communication. If we add to the number of educators and psy- 
chologists the parents of the more than twenty thousand hearing- 
handicapped children now in special schools and classes, as well 
as the puzzled, often frustrated or panic stricken parents of pre- 
school age deaf children, it will total a goodly number of potential 
readers for any publication with the promising title listed above. 

While all of these readers could profit from a perusal of the book, 
the generalizations implied in the title are somewhat misleading. 
The book is more precisely the report of a four-week session at the 
Vassar Summer (1949) Institute for Family and Community Liv- 
ing, attended by eleven young (two and one-half to nine years) 
hearing handicapped (thirty-seven to eighty-five db loss) children 
and their parents. The children spent the greater part of the time 
with hearing children of similar age groups, receiving special 
help in speech, speech reading, and auditory training. Most of 
these children could be reached with amplified sound in the speech 
range and will probably be educated as hard of hearing rather than 
as deaf. Audiograms are presented for all, but as the author states 
(p. 19) they are only roughly comparable. A section on the validity 
and reliability of hearing tests, the EEG and GSR techniques 
would have added to the scientific aspects of the book. 

The discussion on ‘sense training’ fails to bring out the real pur- 
pose of these activities in the first weeks of a small deaf child’s 
schooling. Sense training is not an end in itself, but something which 
the deaf child can do successfully in coéperation with an adult. 
This codperation is a necessary step if speech and language training 
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are to begin, for even though it may be an “antiquated theory” 
(p. 9) there is historically no record of a deaf child’s developing 
conversational speech and language without formal training. The 
teaching profession would welcome evidence that it could happen. 

These children were young, and the young deaf child will never 
again be so like his hearing counterpart. The plea is for ‘adjust- 
ment’, rather than for emphasis on communication skills. As the 
deaf child grows into adolescence and adulthood, what is going to 
be the measure of his adjustment? Surely his adequacy in speech 
and language will play an important part. This is the old dilemma 
of the deaf child’s education, and this book does not give a com- 
plete solution. The things that the child needs for good adjustment 
at sixteen or sixty are the things which the hearing child learns 
without effort and involuntarily before he is six, but which the 
truly deaf child must learn through voluntary attention to visual 
cues at any age. 

This book is not a report of a controlled experiment, but a sincere 
attempt to see what would happen under the circumstances. We 
may assume that staff members not previously acquainted with 
children with severe hearing losses learned a great deal about such 
children—a highly desirable outcome in public relations and better 
understanding of deaf children. ELOIsE KENNEDY 

University of Ilionois 


Luoyp A. Jerrress. Cerebral Mechanisms in Behavior. The Hixon 
Symposium. New York: John Wiley and Sons, Inc., pp. 311. 


Cerebral Mechanisms in Behavior is a volume containing the con- 
tent of a symposium on a topic held at California Institute of 
Technology during the week of September 20 to September 25, 
1948. The symposium was sponsored by the Hixon Fund. Lloyd A. 
Jeffress, who writes the preface in the volume, was selected by the 
institute to spend the year 1947 in research and audition and help 
organize the 1948 symposium. The selection of contributions as 
well as contributors will interest people in the field. They include 
contributions by John von Neumann on the general and logical 
theory of automata; an interesting and well-considered chapter by 
Warren S. McCulloch called ‘Why The Mind Is In The Head;” 
a chapter on the problem of serial order in behavior by K. 5. 
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Lashley; a chapter by Heinrich Kliiver on functional differences 
between the occipital and temporal lobes; a chapter by Wolfgang 
Kohler on relational determination in perception, and a chapter 
by Halstead on brain and intelligence. Like the atomic scientists, 
some of the contributors in this volume too have a good sense of 
perspective, a realization of the significance of what they do, a sense 
of limitation of it, and even at times a sense of humor about it. 

There are some very rich give-and-takes in the volume which is 
exceedingly well edited and in good taste. The chapter that will 
particularly interest psychologists is the contribution by Henry W. 
Brosin on the symposium from the viewpoint of a clinician. Em- 
phasized by Brosin in this chapter is the need for learning a com- 
mon vocabulary by living and working together with the assump- 
tion that reading and experimentation in isolation are inadequate. 
Brosin’s impression is ‘‘that the logic of determinism and the prag- 
matic study of relations and operations are adequate scientific 
working methods for the skeptical clinician who deals necessarily 
with poorly defined complex conditions, even though he himself 
has not been able to create a useful vocabulary for his needs.” 
Brosin ends his contribution with a reference to Boring’s often 
quoted comment that psychology lacks a great man. As far as he is 
concerned, Freud is that man, and the exploitation of the free- 
association techniques and the inquiry into the meaning of psycho- 
dynamic social patterns can keep us occupied productively for 
many years. He suggests in his last sentence that psychology may 
find its great man in the person who can utilize the Freudian con- 
cepts and bring them into closer approximation with the ideals of 
Wundt. The impression of the reviewer is that this conclusion repre- 
sents a misinterpretation of present observable trends and that 
the promise of growth appears to be in the application of experi- 
mental methods in the exploration and validation of materials and 
concepts from psychoanalyses, cultural anthropology, and sociol- 
ogy. 

The volume contains the well-considered deliberations of men 
who know what they are talking about and how to write about it, 
and should prove profitable reading to psychologists and people 
from the biological sciences interested in the topics of this sympo- 
sium, H. MELTzER 
Psychological Service Center 
St. Louis, Missouri 
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